How to Fix 'vector search returning irrelevant results when scaling' in LangGraph (Python)
When vector search starts returning irrelevant results as your LangGraph app scales, it usually means retrieval quality degraded under load, not that LangGraph itself is broken. The common pattern is simple: the same query that worked in a small test set starts pulling weak matches once your index grows, your chunking changes, or your embedding/index settings drift.
In practice, this shows up as StateGraph nodes answering with off-topic context, Retriever outputs that look semantically close but are factually wrong, and downstream LLM responses that miss the user’s intent.
The Most Common Cause
The #1 cause is bad chunking or inconsistent embedding setup between ingestion and query time. In LangGraph, this often hides behind a clean graph flow because retrieve_docs -> generate_answer still runs fine, but the retriever is feeding garbage into the chain.
Here’s the broken pattern I see most often: documents are chunked too aggressively, embeddings are created with one model during ingestion and another during querying, or metadata filters are missing.
| Broken pattern | Fixed pattern |
|---|---|
| Chunk size too small, overlap missing, embedding model mismatch | Stable chunking, consistent embedding model, explicit retriever config |
# BROKEN
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)
chunks = splitter.split_documents(docs)
# Ingestion uses one embedding model
ingest_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = Chroma.from_documents(chunks, ingest_embeddings)
# Query path accidentally uses a different model
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
retriever = db.as_retriever(search_kwargs={"k": 5})
# LangGraph node gets low-quality context
# FIXED
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=120,
)
chunks = splitter.split_documents(docs)
db = Chroma.from_documents(chunks, embeddings)
retriever = db.as_retriever(
search_kwargs={"k": 8}
)
If you are using LangGraph, keep retrieval in a dedicated node and inspect what it returns before generation:
def retrieve(state):
query = state["question"]
docs = retriever.invoke(query)
return {"context_docs": docs}
If docs are already irrelevant here, the problem is upstream of the LLM.
Other Possible Causes
1. Your index is getting stale
If you re-ingest documents without rebuilding or upserting correctly, you can end up with partial data and old vectors still being queried.
# BAD: reusing an old persistent store without verifying freshness
db = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
Fix it by versioning collections or rebuilding on schema changes.
db = Chroma(
collection_name="policies_v3",
persist_directory="./chroma_db",
embedding_function=embeddings,
)
2. k is too high for noisy data
A larger k does not always improve retrieval. If your corpus has many near-duplicates or weak chunks, increasing k can flood the prompt with irrelevant context.
retriever = db.as_retriever(search_kwargs={"k": 20}) # often too broad
Try a smaller top-k first:
retriever = db.as_retriever(search_kwargs={"k": 5})
3. You need metadata filtering
If your corpus mixes products, regions, or document types, pure semantic search will drift at scale. Filter first, then retrieve.
retriever = db.as_retriever(
search_kwargs={
"k": 5,
"filter": {"doc_type": "claims_policy", "region": "us"}
}
)
Without filters, a query about “claims escalation” may pull HR docs because they share generic language.
4. Your graph node is mutating the query
I’ve seen LangGraph nodes rewrite queries too aggressively before retrieval. That makes retrieval look random because the vector store is searching for an LLM-generated paraphrase instead of the user’s intent.
def rewrite_query(state):
# Too aggressive: changes meaning
rewritten = llm.invoke(f"Rewrite this better: {state['question']}")
return {"question": rewritten.content}
Keep rewrites conservative:
def rewrite_query(state):
return {"question": state["question"].strip()}
How to Debug It
- •
Print the raw retrieved chunks
- •Don’t inspect only the final answer.
- •Log
page_content, metadata, and similarity scores if your store supports them. - •If the top chunks are wrong, your issue is retrieval quality.
- •
Test retrieval outside LangGraph
- •Call the retriever directly with one known-good query.
- •Compare output from:
- •plain Python script
- •LangGraph node execution
- •If plain retrieval works but graph output fails, your node logic is mutating state.
- •
Check embeddings consistency
- •Verify ingestion and query use the same embedding model.
- •Check dimension compatibility if you swapped providers.
- •A mismatch often produces silent quality degradation rather than a hard error like
ValueError.
- •
Inspect chunk boundaries
- •Print a few sample chunks.
- •Look for fragments split across sentences or sections.
- •If chunks contain too little context, semantic similarity becomes noisy at scale.
Prevention
- •Use one embedding model per collection and version it explicitly.
- •Tune chunk size and overlap on real documents before production rollout.
- •Add retrieval tests in CI:
- •known query
- •expected top document IDs
- •fail if relevance drops after reindexing
If you want stable LangGraph retrieval under scale, treat vector search as an indexed data pipeline problem first and an agent problem second. Most “irrelevant results” bugs come from ingestion drift, bad chunking, or unfiltered corpora — not from StateGraph itself.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit