How to Fix 'vector search returning irrelevant results in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-22

vector-search-returning-irrelevant-results-in-productionlangchainpython

If your LangChain vector search is returning irrelevant results in production, the retrieval layer is usually working exactly as configured — just not as you expected. The problem typically shows up after a demo works fine, then real user traffic exposes bad chunking, mismatched embeddings, stale indexes, or broken query normalization.

This is not a “LangChain bug” most of the time. It’s usually a retrieval pipeline mismatch between how you indexed documents and how you query them.

The Most Common Cause

The #1 cause is embedding mismatch: you indexed documents with one embedding model and queried with another, or you re-created the vector store with different settings. In LangChain, this often looks like using OpenAIEmbeddings during ingestion and a different embedding class or model at query time.

The symptom is not always an exception. You often get valid-looking results from similarity_search(), but they are semantically wrong.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Different embedding model for indexing and querying	Same embedding model instance/config for both
Rebuilding vector store with new defaults	Persist and reload the same index
No versioning on embeddings	Store embedding model name alongside the index

# BROKEN: indexed with one embedding config, queried with another
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

docs = ["Policy covers water damage if caused by burst pipes."]
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)
chunks = splitter.create_documents(docs)

index_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

vectorstore = FAISS.from_documents(chunks, index_embeddings)

# This returns "relevant" docs that are actually off-target
results = vectorstore.similarity_search("Does this policy cover flood damage?", k=3)

# FIXED: same embedding config used consistently
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

docs = ["Policy covers water damage if caused by burst pipes."]
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)
chunks = splitter.create_documents(docs)

vectorstore = FAISS.from_documents(chunks, embeddings)

results = vectorstore.similarity_search("Does this policy cover flood damage?", k=3)

If you use persistence, keep the exact same embedding model and dimensionality:

vectorstore.save_local("./faiss_index")

# later...
vectorstore = FAISS.load_local(
    "./faiss_index",
    embeddings,
    allow_dangerous_deserialization=True,
)

Other Possible Causes

1) Bad chunking strategy

If chunks are too large, retrieval becomes noisy. If they’re too small, you lose context and get partial matches.

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=0,
)

# Better for many enterprise docs:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
)

For insurance or banking docs, I usually want chunks that preserve clause-level meaning. A 2,000-character blob often mixes exclusions, definitions, and exceptions into one embedding.

2) Wrong similarity metric or retriever settings

Some stores default to cosine-like behavior; others need explicit normalization or metric selection. If your embeddings aren’t normalized but your store assumes cosine similarity semantics, ranking degrades.

# Example: tune retriever behavior instead of taking defaults blindly
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20},
)

If your top result is always generic text like “terms and conditions,” MMR can help by diversifying results. If recall is poor, increase fetch_k.

3) Stale or partially rebuilt index

This happens when ingestion jobs fail halfway through and production queries hit an old index snapshot. LangChain won’t raise anything obvious; it will just retrieve outdated content.

# Bad: overwriting without versioning
vectorstore.save_local("./index")

# Better: versioned index path
vectorstore.save_local("./indexes/policies_2024_11_18")

If your source docs changed but the vector store did not, you’ll get irrelevant results that look “close enough” to be dangerous.

4) Query preprocessing is inconsistent

If you normalize documents during ingestion but not queries — or vice versa — retrieval quality drops. This includes lowercasing, stripping punctuation, expanding acronyms, and domain-specific rewrites.

def normalize(text: str) -> str:
    return text.lower().strip()

query = normalize("Does this cover Flood Damage?")
results = vectorstore.similarity_search(query, k=5)

For regulated domains, I’d keep preprocessing minimal unless it’s applied consistently on both sides.

How to Debug It

•
Check the embedding model identity
- •Print the exact model name used at ingestion and query time.
- •Confirm dimensionality matches if your store exposes it.
- •If these differ, stop there.
•
Inspect raw retrieved chunks
- •Don’t debug through an LLM chain first.
- •Call similarity_search() directly and print content + metadata.
- •If the chunks are wrong before generation starts, the issue is retrieval.
•
Compare top-k behavior
- •Try k=1, k=5, k=10.
- •If relevant chunks appear only at higher k, ranking is weak.
- •Tune chunk size, overlap, and retriever settings like fetch_k.
•
Verify index freshness
- •Check whether your current production index contains the latest document versions.
- •Add a document hash or ingestion timestamp in metadata.
- •Query for that metadata to confirm rebuilds are actually happening.

Prevention

•Use one shared embedding config object for both ingestion and querying.
•Version every vector index by schema + embedding model + source dataset hash.
•Log retrieved chunk IDs and scores in production so bad retrievals are visible before users complain.
•Test retrieval directly with known queries before wiring it into RetrievalQA, ConversationalRetrievalChain, or custom LangChain agents.

When LangChain returns irrelevant results in production, treat it like an indexing incident first and an LLM incident second. In most cases, fixing embeddings consistency and chunking gets you most of the way back to stable retrieval.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit