How to Fix 'vector search returning irrelevant results' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-22

vector-search-returning-irrelevant-resultslangchainpython

If your LangChain vector search is returning irrelevant results, the retriever is usually working exactly as configured — just not as you intended. The failure typically shows up when embeddings, chunking, metadata filters, or query strategy don’t match the way the index was built.

This is not a “LangChain bug” in most cases. It’s usually a data pipeline mismatch: wrong embedding model, bad chunking, or querying a store with settings that don’t match the indexed content.

The Most Common Cause

The #1 cause is embedding mismatch: you indexed documents with one embedding model and queried with another, or you changed models after the index was built.

That produces retrieval that looks valid technically, but semantically garbage. You won’t always get an exception; you’ll just get low-quality neighbors from similarity_search() or as_retriever().

Broken vs fixed pattern

Broken	Fixed
Build index with one embedding model, query with another	Use the same embedding class and config for indexing and querying
Recreate `VectorStore` from existing persisted data without verifying embedding compatibility	Persist embedding model name/version alongside the index
Assume `similarity_search()` will “just work” across model changes	Rebuild the index when embeddings change

# BROKEN: indexed with one embedding model, queried with another

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Index creation
index_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(
    ["PCI compliance requires access logging", "Claims workflows need audit trails"],
    embedding=index_embeddings,
    collection_name="docs",
)

# Later: query time uses a different model
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore._embedding_function = query_embeddings  # don't do this

results = vectorstore.similarity_search("How do we log access?", k=3)

# FIXED: use the same embedding model for both indexing and querying

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Chroma.from_texts(
    ["PCI compliance requires access logging", "Claims workflows need audit trails"],
    embedding=embeddings,
    collection_name="docs",
)

results = vectorstore.similarity_search("How do we log access?", k=3)

If you persist the store, persist the embedding configuration too. If you change models, rebuild the index.

Other Possible Causes

1) Bad chunking strategy

If chunks are too large, retrieval gets noisy. If they’re too small, semantic context disappears.

# BAD: huge chunks, weak boundaries
TextSplitter(chunk_size=4000, chunk_overlap=0)

# BETTER: smaller chunks with overlap
TextSplitter(chunk_size=800, chunk_overlap=120)

For policy docs or claims docs, start around 500–1000 characters per chunk and tune from there.

2) Wrong distance metric for your data

Some stores default to cosine similarity, others may use L2 or inner product. If your vectors were normalized for cosine but your store is configured differently, ranking gets weird.

from langchain_community.vectorstores import FAISS

# Check how your store ranks vectors
vectorstore = FAISS.from_texts(texts, embeddings)
docs = vectorstore.similarity_search("fraud detection controls", k=5)

For FAISS-based setups, verify whether your index uses normalized vectors if you expect cosine-like behavior.

3) Query rewriting is distorting intent

If you use an LLM chain to rewrite queries before retrieval, it can broaden or change meaning. That often happens with ConversationalRetrievalChain or custom retrievers.

# Example pattern that can distort retrieval intent
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

If results are off only in chat mode but fine with direct search, inspect the rewritten question. Log both the original user input and the final retriever query.

4) Metadata filters are excluding the right documents

A filter that’s too strict can force irrelevant fallback matches from a smaller pool.

retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 5,
        "filter": {"department": "claims", "region": "EU"}
    }
)

Check whether your metadata keys actually exist on every document. A typo like "departmant" won’t throw a helpful LangChain error; it just returns poor results or nothing useful.

How to Debug It

•
Test raw similarity search without chains
- •Call vectorstore.similarity_search("your query", k=5) directly.
- •If this is bad, the issue is in embeddings/chunking/store config.
- •If this is good but your chain is bad, inspect retriever logic or query rewriting.
•
Print the top returned documents and scores
- •Use methods like similarity_search_with_score() where supported.
- •You want to see whether relevant docs are close but ranked lower than junk.
- •If scores are flat or nonsensical, suspect embeddings or normalization.
•
Verify embedding consistency
- •Confirm the same class and model were used at ingest and query time.
- •
  Log model names explicitly:
```
print(embeddings.model)
```
- •If you changed providers or versions, rebuild the index.
•
Inspect chunk boundaries
- •Print a few stored chunks before indexing.
- •Look for broken sentences, duplicated headers, or chunks that mix unrelated topics.
- •Bad chunking often looks fine at ingestion time and terrible at retrieval time.

Prevention

•
Keep embedding config versioned with the index.
- •Store provider name, model name, dimensions, normalization behavior, and build date.
•
Add a retrieval smoke test in CI.
- •Query known prompts and assert top-k contains expected document IDs.
•
Tune chunking before scaling up ingestion.
- •Start with a small corpus and inspect actual retrieved chunks before indexing millions of tokens.

If you want stable retrieval in LangChain Python, treat vector search like infrastructure code. Lock down embeddings, chunking, metadata schema, and store configuration together — otherwise you’ll keep getting “irrelevant results” that are really just inconsistent inputs.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit