How to Fix 'vector search returning irrelevant results in production' in LangGraph (Python)
When vector search starts returning irrelevant results in production, the retrieval layer is usually working, but the data you’re searching is not what you think it is. In LangGraph Python apps, this typically shows up after you wire a retriever into a graph node and suddenly see low-similarity chunks, wrong tenants, or stale embeddings coming back.
The root cause is usually not LangGraph itself. It’s almost always bad chunking, embedding mismatch, metadata filtering mistakes, or a state bug in the graph that sends the wrong query into retrieval.
The Most Common Cause
The #1 cause is embedding/query mismatch caused by inconsistent preprocessing.
You indexed one text format and queried another. In production this often happens when one code path strips punctuation, lowercases aggressively, truncates context, or uses a different embedding model than the one used at ingest time.
Here’s the broken pattern:
# WRONG: ingest and query use different preprocessing and even different models
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
ingest_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-small") # mismatch
vectorstore = Chroma(
collection_name="docs",
embedding_function=ingest_embeddings,
persist_directory="./chroma"
)
def retrieve(query: str):
normalized_query = query.lower().strip()[:200] # changes meaning
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
return retriever.invoke(normalized_query)
And here’s the fixed pattern:
# RIGHT: same embedding model, same text normalization rules, same chunking assumptions
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma(
collection_name="docs",
embedding_function=embeddings,
persist_directory="./chroma"
)
def normalize(text: str) -> str:
return text.strip()
def retrieve(query: str):
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
return retriever.invoke(normalize(query))
If your app uses LangGraph, this bug often hides inside a node:
from typing import TypedDict
from langgraph.graph import StateGraph
class State(TypedDict):
question: str
docs: list
def retrieve_node(state: State):
# If state["question"] was rewritten by another node, retrieval quality drops fast
docs = retriever.invoke(state["question"])
return {"docs": docs}
The failure mode usually isn’t a hard exception. You’ll see poor results even though the pipeline is “healthy”.
Other Possible Causes
| Cause | What it looks like | Fix |
|---|---|---|
| Wrong chunk size / overlap | Retrieved chunks are too small or split mid-thought | Rechunk with sane defaults |
| Metadata filter too broad or too strict | Search returns unrelated tenant/document set | Tighten filter conditions |
| Stale index | New docs aren’t reflected in results | Rebuild or upsert embeddings correctly |
| Wrong retriever config | k, MMR, score threshold are mis-tuned | Tune search params per corpus |
1) Bad chunking strategy
If chunks are too large, embeddings blur topics. If they’re too small, you lose context.
# BAD
RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
# BETTER
RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
For legal, claims, or policy docs, chunk by structure first if possible. Headings beat blind character splitting.
2) Metadata filter mistakes
A common production issue is retrieving across tenants because the filter key doesn’t match what was stored.
# BAD: filter key doesn't exist in indexed metadata
retriever = vectorstore.as_retriever(
search_kwargs={"k": 5, "filter": {"tenant_id": "acme"}}
)
# GOOD: use exact metadata field names from ingestion
retriever = vectorstore.as_retriever(
search_kwargs={"k": 5, "filter": {"org_id": "acme"}}
)
If you’re using Pinecone or Weaviate through LangChain wrappers, the same rule applies: filter fields must match indexed metadata exactly.
3) Stale embeddings after document updates
If you update raw documents but don’t re-embed them, retrieval gets old answers.
# BAD: document content changed but vector index was never refreshed
db.add_texts(["new policy text"]) # without deleting/upserting old vectors properly
# GOOD: upsert with stable IDs and refresh embeddings on change
db.upsert(
ids=["policy-123"],
texts=["new policy text"],
)
In production systems, make ID management explicit. Otherwise you end up with duplicate vectors for the same source document.
4) Retriever configuration is too loose
Sometimes the problem is not bad vectors; it’s bad ranking settings.
# BAD: too many low-signal results bubble up
retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
# BETTER: start smaller and add score thresholds if supported
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
If your backend supports MMR or similarity score thresholds, use them for noisy corpora. Plain top-k can be weak when many chunks are semantically similar.
How to Debug It
- •
Inspect raw retrieved documents
- •Print
page_content,metadata, and similarity scores. - •If scores look random or metadata points to the wrong tenant/source, the issue is upstream of LangGraph.
- •Print
- •
Compare ingest-time and query-time embeddings
- •Verify model name, dimension count, normalization rules.
- •A mismatch here often produces silent garbage results rather than exceptions like
ValueError: Expected embedding dimension X got Y.
- •
Test retrieval outside LangGraph
- •Call the retriever directly before wiring it into
StateGraph. - •If direct retrieval works but graph retrieval fails, your bug is in state mutation or routing logic.
- •Call the retriever directly before wiring it into
- •
Trace state between nodes
- •Log
state["question"]before retrieval. - •A rewrite node may be turning a precise user query into something generic like “explain it”, which destroys recall.
- •Log
Example trace point:
def retrieve_node(state):
print("QUERY:", state["question"])
docs = retriever.invoke(state["question"])
print("TOP DOC:", docs[0].metadata if docs else None)
return {"docs": docs}
If you see irrelevant results only after an LLM rewrite step, stop rewriting queries blindly. Preserve the original question and pass both versions through the graph.
Prevention
- •Use one embedding model per collection and lock it in config.
- •Store stable document IDs and rebuild/upsert indexes on content changes.
- •Add retrieval tests with known questions and expected source documents before shipping to production.
If you want a practical guardrail, write an integration test that fails when top-1 retrieval does not come from an expected doc ID. That catches most “irrelevant results” regressions before users do.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit