How to Fix 'vector search returning irrelevant results in production' in LlamaIndex (Python)
When vector search returning irrelevant results in production shows up in a LlamaIndex app, it usually means your retriever is working technically but failing semantically. You’re getting nearest neighbors, just not the ones your users expect.
This typically happens after a data refresh, a model swap, or when the index was built with one chunking strategy and queried with another. In practice, it’s almost always a mismatch between how you indexed content and how you retrieve it.
The Most Common Cause
The #1 cause is bad chunking or inconsistent embedding setup during indexing vs querying.
In LlamaIndex, this usually shows up as VectorStoreIndex returning documents that are “close” in embedding space but irrelevant because chunks are too large, too small, or built with a different embedding model than the one used at query time.
Here’s the broken pattern:
# BROKEN: inconsistent embeddings + poor chunking
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
docs = SimpleDirectoryReader("./data").load_data()
# Too large for most retrieval tasks
splitter = SentenceSplitter(chunk_size=2048, chunk_overlap=0)
nodes = splitter.get_nodes_from_documents(docs)
# Index built with one embedding model...
index = VectorStoreIndex.from_documents(
docs,
transformations=[splitter],
embed_model=OpenAIEmbedding(model="text-embedding-3-small"),
)
# ...but query-time config changed later in production
retriever = index.as_retriever(similarity_top_k=3)
results = retriever.retrieve("What is our refund policy?")
And here’s the fixed pattern:
# FIXED: consistent embedding model + tighter chunking
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
docs = SimpleDirectoryReader("./data").load_data()
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)
index = VectorStoreIndex.from_documents(
docs,
transformations=[splitter],
embed_model=embed_model,
)
retriever = index.as_retriever(similarity_top_k=8)
results = retriever.retrieve("What is our refund policy?")
What changed:
- •Chunk size dropped from 2048 to 512
- •Overlap was added to preserve context across boundaries
- •The same embedding model is used consistently
If you’re using StorageContext and persisting the index, this gets worse when you rebuild nodes with one config and load them later with another. That’s how you end up with VectorStoreIndex behaving “correctly” but producing junk results.
Other Possible Causes
| Cause | Symptom | Typical fix |
|---|---|---|
| Wrong embedding model | Results are semantically off even for simple queries | Rebuild the index with one embedding model and keep it fixed |
| Bad metadata filtering | Relevant nodes exist but never show up | Inspect filters passed to MetadataFilters |
Low similarity_top_k | Only one or two weak matches returned | Increase similarity_top_k to 5–10 |
| Stale persisted index | New docs aren’t reflected in retrieval | Rebuild or re-ingest after content changes |
1) Wrong embedding model
This happens when indexing uses one model and querying assumes another. You’ll often see no exception at all, just bad ranking.
# BAD: mixed embeddings across environments
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
# index was originally built with "text-embedding-ada-002"
Fix:
# GOOD: pin one embedding model everywhere
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
2) Metadata filters are too aggressive
If you use MetadataFilters, relevant nodes may be excluded before ranking even starts.
from llama_index.core.vector_stores.types import MetadataFilters, ExactMatchFilter
filters = MetadataFilters(filters=[
ExactMatchFilter(key="department", value="legal")
])
retriever = index.as_retriever(filters=filters)
If your source docs are tagged inconsistently (Legal, legal-team, compliance), retrieval will look broken.
3) Top-k is too low
A similarity_top_k=1 setup can return the single closest node even when it’s only vaguely related.
retriever = index.as_retriever(similarity_top_k=1)
Try:
retriever = index.as_retriever(similarity_top_k=8)
4) Stale persisted storage
If you persist an old index and then update source files without re-ingesting, production keeps serving stale vectors.
index.storage_context.persist(persist_dir="./storage")
# source docs changed later, but persist dir was never rebuilt
Rebuild ingestion on content changes:
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
index.storage_context.persist(persist_dir="./storage")
How to Debug It
- •
Inspect the top retrieved nodes
- •Print node text, score, and metadata.
- •If scores are high but text is irrelevant, your chunking or embeddings are off.
- •If nodes look relevant but filtered out later, check metadata filters.
- •
Compare indexing and query configuration
- •Verify the same
embed_model, splitter, and vector store are used in both paths. - •In LlamaIndex apps this often means checking
Settings.embed_model, ingestion pipelines, and persisted storage separately.
- •Verify the same
- •
Test with a known exact-match query
- •Ask something that should map to one document only.
- •If
"refund policy"can’t retrieve a doc containing"refund policy", your chunks are likely too large or stale.
- •
Temporarily disable filters and rerankers
- •Remove
MetadataFilters, rerankers, and hybrid logic. - •If retrieval improves immediately, reintroduce components one by one until the failure returns.
- •Remove
Prevention
- •Pin your embedding model and chunking config in code, not in someone’s notebook.
- •Add an ingestion test that queries for known phrases and asserts the expected doc IDs appear in the top-k results.
- •Persist versioned indexes so you can roll back when a bad re-ingestion ships to production.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit