How to Fix 'vector search returning irrelevant results' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-22
vector-search-returning-irrelevant-resultslanggraphpython

What this error usually means

If your LangGraph app is returning irrelevant vector search results, the retrieval layer is working but the ranking is bad. In practice, this usually shows up when your graph state, embedding pipeline, or vector store config is inconsistent.

The failure pattern is common in RAG workflows built with StateGraph, ToolNode, and a retriever-backed node. You won’t always get a hard exception like ValueError: embedding dimension mismatch; sometimes you just get “valid” results that are semantically wrong.

The Most Common Cause

The #1 cause is embedding mismatch: you indexed documents with one embedding model and queried with another, or you changed chunking/tokenization after indexing.

This happens a lot when people swap models in LangGraph without rebuilding the vector index.

Broken patternFixed pattern
Index with text-embedding-ada-002, query with text-embedding-3-largeUse the same embedding model for both indexing and querying
Reuse old FAISS/Chroma index after changing chunkingRebuild the index after changing chunking or embeddings

Broken code

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Indexed months ago with a different model
index_embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=query_embeddings,  # wrong if DB was built with another model
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
docs = retriever.invoke("How do I reset my policy number?")

Right code

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Rebuild the collection if you changed model/chunking
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db",
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
docs = retriever.invoke("How do I reset my policy number?")

If you are using LangGraph, keep the retriever node isolated so you can verify it independently before wiring it into the graph:

from typing import TypedDict
from langgraph.graph import StateGraph, START, END

class State(TypedDict):
    query: str
    docs: list

def retrieve(state: State):
    return {"docs": retriever.invoke(state["query"])}

graph = StateGraph(State)
graph.add_node("retrieve", retrieve)
graph.add_edge(START, "retrieve")
graph.add_edge("retrieve", END)
app = graph.compile()

Other Possible Causes

1) Bad chunking strategy

If chunks are too large, retrieval gets noisy. If chunks are too small, context gets fragmented.

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=0,
)  # often too coarse for semantic retrieval

chunks = splitter.split_documents(docs)

Better:

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
)

2) Wrong distance metric for your embeddings

Some stores default to cosine similarity; others use L2. If your embeddings aren’t normalized and your store expects cosine-like behavior, ranking can drift.

# Example: make sure your vector store is configured consistently
vectorstore = FAISS.from_documents(docs, embeddings)  # default behavior may differ by backend

For stores that support it, set the metric explicitly:

# Pseudocode depends on backend
collection_metadata = {"hnsw:space": "cosine"}

3) Query rewriting in the graph is mangling intent

LangGraph often uses an LLM node to rewrite user queries before retrieval. If that node over-summarizes or changes entities, retrieval degrades.

def rewrite_query(state):
    # bad: loses important terms like policy IDs or product names
    rewritten = llm.invoke(f"Summarize this question: {state['query']}")
    return {"query": rewritten.content}

Use a constrained rewrite prompt:

def rewrite_query(state):
    prompt = (
        "Rewrite the query for retrieval without removing entities, IDs, dates, or product names:\n"
        f"{state['query']}"
    )
    rewritten = llm.invoke(prompt)
    return {"query": rewritten.content}

4) Retriever settings are too aggressive or too loose

A bad k, MMR setting, or score threshold can return junk.

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 20}  # too many low-signal docs can pollute downstream generation
)

Try tighter settings:

retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.75, "k": 5}
)

How to Debug It

  1. Test retrieval outside LangGraph first.
    Call retriever.invoke("your query") directly. If results are bad here, the issue is not LangGraph orchestration.

  2. Check embedding parity.
    Confirm the same embedding class and model were used for indexing and querying. If you see mixed models like OpenAIEmbeddings(model="text-embedding-ada-002") and OpenAIEmbeddings(model="text-embedding-3-large"), rebuild the index.

  3. Print raw chunks before indexing.
    Inspect what actually got stored.

    for i, doc in enumerate(chunks[:5]):
        print(i, doc.page_content[:300], doc.metadata)
    
  4. Inspect scores from the retriever/vector store.
    If your backend supports it, use score-returning methods.

    results = vectorstore.similarity_search_with_score("reset policy number", k=5)
    for doc, score in results:
        print(score, doc.page_content[:120])
    

If scores look plausible but answers are still irrelevant inside LangGraph, check whether a downstream node is rewriting state["query"] or dropping retrieved documents from state.

Prevention

  • Use one embedding model per collection and version your indexes when anything changes.
  • Keep chunking stable across builds; if you change splitter parameters, rebuild the vector store.
  • Add a small retrieval regression test suite with known queries and expected top-k documents.
  • In LangGraph nodes, log both the final query sent to retrieval and the retrieved document IDs.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides