How to Fix 'vector search returning irrelevant results in production' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-22
vector-search-returning-irrelevant-results-in-productionlanggraphpython

When vector search starts returning irrelevant results in production, the retrieval layer is usually working, but the data you’re searching is not what you think it is. In LangGraph Python apps, this typically shows up after you wire a retriever into a graph node and suddenly see low-similarity chunks, wrong tenants, or stale embeddings coming back.

The root cause is usually not LangGraph itself. It’s almost always bad chunking, embedding mismatch, metadata filtering mistakes, or a state bug in the graph that sends the wrong query into retrieval.

The Most Common Cause

The #1 cause is embedding/query mismatch caused by inconsistent preprocessing.

You indexed one text format and queried another. In production this often happens when one code path strips punctuation, lowercases aggressively, truncates context, or uses a different embedding model than the one used at ingest time.

Here’s the broken pattern:

# WRONG: ingest and query use different preprocessing and even different models
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

ingest_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")  # mismatch

vectorstore = Chroma(
    collection_name="docs",
    embedding_function=ingest_embeddings,
    persist_directory="./chroma"
)

def retrieve(query: str):
    normalized_query = query.lower().strip()[:200]  # changes meaning
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
    return retriever.invoke(normalized_query)

And here’s the fixed pattern:

# RIGHT: same embedding model, same text normalization rules, same chunking assumptions
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

vectorstore = Chroma(
    collection_name="docs",
    embedding_function=embeddings,
    persist_directory="./chroma"
)

def normalize(text: str) -> str:
    return text.strip()

def retrieve(query: str):
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
    return retriever.invoke(normalize(query))

If your app uses LangGraph, this bug often hides inside a node:

from typing import TypedDict
from langgraph.graph import StateGraph

class State(TypedDict):
    question: str
    docs: list

def retrieve_node(state: State):
    # If state["question"] was rewritten by another node, retrieval quality drops fast
    docs = retriever.invoke(state["question"])
    return {"docs": docs}

The failure mode usually isn’t a hard exception. You’ll see poor results even though the pipeline is “healthy”.

Other Possible Causes

CauseWhat it looks likeFix
Wrong chunk size / overlapRetrieved chunks are too small or split mid-thoughtRechunk with sane defaults
Metadata filter too broad or too strictSearch returns unrelated tenant/document setTighten filter conditions
Stale indexNew docs aren’t reflected in resultsRebuild or upsert embeddings correctly
Wrong retriever configk, MMR, score threshold are mis-tunedTune search params per corpus

1) Bad chunking strategy

If chunks are too large, embeddings blur topics. If they’re too small, you lose context.

# BAD
RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)

# BETTER
RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)

For legal, claims, or policy docs, chunk by structure first if possible. Headings beat blind character splitting.

2) Metadata filter mistakes

A common production issue is retrieving across tenants because the filter key doesn’t match what was stored.

# BAD: filter key doesn't exist in indexed metadata
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5, "filter": {"tenant_id": "acme"}}
)

# GOOD: use exact metadata field names from ingestion
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5, "filter": {"org_id": "acme"}}
)

If you’re using Pinecone or Weaviate through LangChain wrappers, the same rule applies: filter fields must match indexed metadata exactly.

3) Stale embeddings after document updates

If you update raw documents but don’t re-embed them, retrieval gets old answers.

# BAD: document content changed but vector index was never refreshed
db.add_texts(["new policy text"])  # without deleting/upserting old vectors properly

# GOOD: upsert with stable IDs and refresh embeddings on change
db.upsert(
    ids=["policy-123"],
    texts=["new policy text"],
)

In production systems, make ID management explicit. Otherwise you end up with duplicate vectors for the same source document.

4) Retriever configuration is too loose

Sometimes the problem is not bad vectors; it’s bad ranking settings.

# BAD: too many low-signal results bubble up
retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

# BETTER: start smaller and add score thresholds if supported
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

If your backend supports MMR or similarity score thresholds, use them for noisy corpora. Plain top-k can be weak when many chunks are semantically similar.

How to Debug It

  1. Inspect raw retrieved documents

    • Print page_content, metadata, and similarity scores.
    • If scores look random or metadata points to the wrong tenant/source, the issue is upstream of LangGraph.
  2. Compare ingest-time and query-time embeddings

    • Verify model name, dimension count, normalization rules.
    • A mismatch here often produces silent garbage results rather than exceptions like ValueError: Expected embedding dimension X got Y.
  3. Test retrieval outside LangGraph

    • Call the retriever directly before wiring it into StateGraph.
    • If direct retrieval works but graph retrieval fails, your bug is in state mutation or routing logic.
  4. Trace state between nodes

    • Log state["question"] before retrieval.
    • A rewrite node may be turning a precise user query into something generic like “explain it”, which destroys recall.

Example trace point:

def retrieve_node(state):
    print("QUERY:", state["question"])
    docs = retriever.invoke(state["question"])
    print("TOP DOC:", docs[0].metadata if docs else None)
    return {"docs": docs}

If you see irrelevant results only after an LLM rewrite step, stop rewriting queries blindly. Preserve the original question and pass both versions through the graph.

Prevention

  • Use one embedding model per collection and lock it in config.
  • Store stable document IDs and rebuild/upsert indexes on content changes.
  • Add retrieval tests with known questions and expected source documents before shipping to production.

If you want a practical guardrail, write an integration test that fails when top-1 retrieval does not come from an expected doc ID. That catches most “irrelevant results” regressions before users do.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides