How to Fix 'vector search returning irrelevant results during development' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-22
vector-search-returning-irrelevant-results-during-developmentlanggraphpython

When vector search starts returning irrelevant results in a LangGraph Python app, the problem is usually not LangGraph itself. It’s almost always one of three things: bad chunking, embedding mismatch, or retrieval logic that is too loose for the data you indexed.

This shows up during development when your graph runs end-to-end, but the retrieved context is clearly wrong. You’ll see things like StateGraph nodes completing successfully while your retriever returns semantically distant chunks, or your LLM answers from the wrong document because the top-k hits are noisy.

The Most Common Cause

The #1 cause is indexing and querying with inconsistent embeddings or inconsistent text preprocessing.

A common broken pattern is:

  • chunking one way at ingestion
  • querying with a different embedding model
  • or storing raw text but querying cleaned text, or vice versa

Broken vs fixed pattern

BrokenFixed
Different embedding model for index and querySame embedding model everywhere
Large unstructured chunksChunked with stable boundaries
No metadata filtersFilter by source/type/tenant when needed
# WRONG: index and query are not using the same embedding setup

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

index_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")  # mismatch

vectorstore = Chroma(
    collection_name="docs",
    embedding_function=index_embeddings,
    persist_directory="./chroma_db",
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# This will retrieve against vectors built with a different embedding space
results = retriever.invoke("How do I reset my password?")
# RIGHT: use the same embedding model and consistent preprocessing

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=120,
)

docs = splitter.split_documents(raw_documents)

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name="docs",
    persist_directory="./chroma_db",
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
results = retriever.invoke("How do I reset my password?")

If your LangGraph node wraps this retriever, the graph will still run cleanly. You won’t get a ValidationError or a LangGraphError; you’ll just get bad context flowing through the state.

Other Possible Causes

1. Bad chunk size and overlap

If chunks are too large, embeddings become blurry. If they’re too small, you lose enough context that retrieval becomes random-looking.

# Too large: one chunk may contain multiple topics
splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=0)

# Better for most docs
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)

2. Missing metadata filters

If you mix multiple sources in one collection, retrieval can pull irrelevant documents from the wrong tenant, product, or environment.

# Without filters: any document can match
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# With filters: restrict by source or tenant
retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 5,
        "filter": {"tenant_id": "acme-bank", "doc_type": "policy"},
    }
)

3. Wrong distance strategy or scoring assumptions

Some stores return cosine similarity; others use Euclidean distance or inner product. If you assume “higher score is better” when it isn’t, you’ll rank garbage above relevant matches.

# Example: inspect actual scores instead of assuming ordering behavior
docs_and_scores = vectorstore.similarity_search_with_score("chargeback policy", k=5)

for doc, score in docs_and_scores:
    print(score, doc.metadata)

If your backend uses normalized cosine similarity, verify how scores are interpreted before applying custom reranking in a LangGraph node.

4. Stale index after code changes

A very common dev-time issue is rebuilding your app but not rebuilding the vector store. You change parsing logic, but old embeddings remain on disk.

# During development: clear and rebuild when preprocessing changes
import shutil
shutil.rmtree("./chroma_db", ignore_errors=True)

If you changed:

  • chunking strategy
  • document cleaning
  • embedding model
  • metadata schema

then rebuild the index.

How to Debug It

  1. Print raw retrieved chunks before they enter the graph

    • In your retrieval node, log top-k docs and metadata.
    • If the chunks are obviously wrong before generation starts, the bug is in retrieval/indexing.
  2. Compare query text to indexed text

    • Check whether your query is normalized differently from your documents.
    • Look for stripped punctuation, lowercase transforms, OCR noise, or HTML artifacts.
  3. Test retrieval outside LangGraph

    • Call retriever.invoke() directly in a script.
    • If it fails outside the graph too, this is not a StateGraph problem; it’s a vector store problem.
  4. Inspect scores and nearest neighbors

    • Use similarity_search_with_score() and compare scores across known-good queries.
    • If unrelated docs score similarly to relevant ones, fix embeddings/chunking first.

A simple debug node helps:

def debug_retrieval_node(state):
    query = state["question"]
    docs = retriever.invoke(query)

    print(f"QUERY: {query}")
    for i, doc in enumerate(docs):
        print(f"\nHIT {i+1}")
        print(doc.page_content[:300])
        print(doc.metadata)

    return {"retrieved_docs": docs}

If this prints irrelevant content consistently, LangGraph is just passing through bad retrieval results correctly.

Prevention

  • Use one embedding model per collection and pin its version.
  • Rebuild vector stores whenever chunking or preprocessing changes.
  • Add metadata filters early if you have multiple document sources.
  • Write a small retrieval regression test with known queries and expected top-k documents.

A good test catches this before it reaches your graph:

def test_password_reset_retrieval():
    docs = retriever.invoke("How do I reset my password?")
    assert any("password reset" in d.page_content.lower() for d in docs[:3])

If that fails, don’t touch LangGraph first. Fix ingestion, embeddings, or filters first.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides