How to Fix 'vector search returning irrelevant results during development' in LlamaIndex (Python)
When vector search returning irrelevant results during development shows up in LlamaIndex, it usually means your retrieval pipeline is technically working but semantically broken. The index is returning something, just not the chunks you expected, which usually points to bad chunking, mismatched embedding models, stale indexes, or querying the wrong store.
This is common during local development because people rebuild data, tweak loaders, or swap models without re-indexing. LlamaIndex won’t always throw a hard error like ValueError or KeyError; more often you’ll see “working” retrieval with garbage results from VectorStoreIndex.as_query_engine() or RetrieverQueryEngine.
The Most Common Cause
The #1 cause is indexing and querying with different embedding models or stale embeddings.
A classic failure mode is: you build the index with one embedding model, change the model later, and keep querying the old persisted vector store. LlamaIndex will happily retrieve vectors that no longer match your current semantic space.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Build index with one embed model, query with another | Use the same embed model for indexing and querying |
| Reuse persisted storage after changing chunking/model | Rebuild the index after config changes |
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
# Index built yesterday with a different embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
# Later in the same project someone changes this:
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
query_engine = index.as_query_engine()
response = query_engine.query("What is the refund policy?")
print(response)
# Symptom: returns irrelevant chunks even though the query "works"
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.embed_model = embed_model
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is the refund policy?")
print(response)
If you persist storage, this gets worse:
# BROKEN: stale persisted index
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
# embeddings changed since last persist, but storage wasn't rebuilt
Fix it by deleting the old persist directory and rebuilding after any embedding/model/chunking change.
Other Possible Causes
1) Bad chunking strategy
If chunks are too large, retrieval gets noisy. If they’re too small, context gets fragmented and similarity becomes weak.
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=2048, chunk_overlap=20) # often too large for dev docs
Try something more controlled:
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=80)
If your docs are policy-heavy or structured, smaller chunks usually win.
2) Querying with low top-k
Sometimes the right chunk is in the index but not in the top results.
query_engine = index.as_query_engine(similarity_top_k=1)
Increase it:
query_engine = index.as_query_engine(similarity_top_k=5)
For debugging retrieval quality directly:
retriever = index.as_retriever(similarity_top_k=5)
nodes = retriever.retrieve("How do I reset my password?")
for node in nodes:
print(node.score, node.node.get_content())
3) Wrong document loader or bad source text
If your loader strips tables, headers, or key fields, embeddings will be weak from the start.
from llama_index.core import SimpleDirectoryReader
docs = SimpleDirectoryReader("./pdfs").load_data()
That may be fine for plain text files but weak for PDFs with layout-heavy content. Use a loader that preserves structure better if your source data depends on it.
4) Mixing indexes or vector stores accidentally
You may think you’re querying one dataset but actually hit another VectorStoreIndex.
# Two indexes created in different runs/directories
index_a = VectorStoreIndex.from_documents(docs_a)
index_b = VectorStoreIndex.from_documents(docs_b)
query_engine = index_b.as_query_engine()
Make sure your app wiring points to the correct StorageContext, persist directory, and collection name if using an external vector DB like Chroma or Pinecone.
How to Debug It
- •
Inspect retrieved nodes directly Don’t start with answer quality. Start with what got retrieved.
retriever = index.as_retriever(similarity_top_k=5) results = retriever.retrieve("your test query") for r in results: print(r.score) print(r.node.get_content()[:500])If these chunks are irrelevant, your problem is retrieval setup, not synthesis.
- •
Verify embedding model consistency Print both indexing-time and query-time embedding config.
- •Same model name?
- •Same dimensions?
- •Same provider version?
If you changed models since persisting data, rebuild everything.
- •
Check chunk size and overlap If chunks are huge blobs of unrelated text, similarity scores get muddy.
- •Try
chunk_size=256to512 - •Try
chunk_overlap=50to100 - •Re-index after changing parser settings
- •Try
- •
Clear storage and rebuild If you use persistence:
- •delete
./storage - •delete vector DB test collections
- •re-run ingestion from scratch
Stale indices are one of the most common causes of “it worked yesterday”.
- •delete
Prevention
- •Keep embedding config in one place using
Settings.embed_model, and never change it without rebuilding indexes. - •Treat chunking as part of schema design. If you change
SentenceSplitter, re-index. - •In dev environments, use a clean persist directory per experiment so stale vectors don’t survive config changes.
- •Add a retrieval smoke test that prints top-3 retrieved chunks for a known query before you trust answer generation.
If you’re still seeing irrelevant results after fixing embeddings and chunking, stop looking at the LLM layer. The problem is almost always in ingestion or retrieval configuration long before RetrieverQueryEngine gets involved.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit