How to Fix 'vector search returning irrelevant results in production' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-22

vector-search-returning-irrelevant-results-in-productionllamaindexpython

When vector search returning irrelevant results in production shows up in a LlamaIndex app, it usually means your retriever is working technically but failing semantically. You’re getting nearest neighbors, just not the ones your users expect.

This typically happens after a data refresh, a model swap, or when the index was built with one chunking strategy and queried with another. In practice, it’s almost always a mismatch between how you indexed content and how you retrieve it.

The Most Common Cause

The #1 cause is bad chunking or inconsistent embedding setup during indexing vs querying.

In LlamaIndex, this usually shows up as VectorStoreIndex returning documents that are “close” in embedding space but irrelevant because chunks are too large, too small, or built with a different embedding model than the one used at query time.

Here’s the broken pattern:

# BROKEN: inconsistent embeddings + poor chunking
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

docs = SimpleDirectoryReader("./data").load_data()

# Too large for most retrieval tasks
splitter = SentenceSplitter(chunk_size=2048, chunk_overlap=0)
nodes = splitter.get_nodes_from_documents(docs)

# Index built with one embedding model...
index = VectorStoreIndex.from_documents(
    docs,
    transformations=[splitter],
    embed_model=OpenAIEmbedding(model="text-embedding-3-small"),
)

# ...but query-time config changed later in production
retriever = index.as_retriever(similarity_top_k=3)
results = retriever.retrieve("What is our refund policy?")

And here’s the fixed pattern:

# FIXED: consistent embedding model + tighter chunking
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

docs = SimpleDirectoryReader("./data").load_data()

embed_model = OpenAIEmbedding(model="text-embedding-3-small")
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)

index = VectorStoreIndex.from_documents(
    docs,
    transformations=[splitter],
    embed_model=embed_model,
)

retriever = index.as_retriever(similarity_top_k=8)
results = retriever.retrieve("What is our refund policy?")

What changed:

•Chunk size dropped from 2048 to 512
•Overlap was added to preserve context across boundaries
•The same embedding model is used consistently

If you’re using StorageContext and persisting the index, this gets worse when you rebuild nodes with one config and load them later with another. That’s how you end up with VectorStoreIndex behaving “correctly” but producing junk results.

Other Possible Causes

Cause	Symptom	Typical fix
Wrong embedding model	Results are semantically off even for simple queries	Rebuild the index with one embedding model and keep it fixed
Bad metadata filtering	Relevant nodes exist but never show up	Inspect filters passed to `MetadataFilters`
Low `similarity_top_k`	Only one or two weak matches returned	Increase `similarity_top_k` to 5–10
Stale persisted index	New docs aren’t reflected in retrieval	Rebuild or re-ingest after content changes

1) Wrong embedding model

This happens when indexing uses one model and querying assumes another. You’ll often see no exception at all, just bad ranking.

# BAD: mixed embeddings across environments
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
# index was originally built with "text-embedding-ada-002"

Fix:

# GOOD: pin one embedding model everywhere
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

2) Metadata filters are too aggressive

If you use MetadataFilters, relevant nodes may be excluded before ranking even starts.

from llama_index.core.vector_stores.types import MetadataFilters, ExactMatchFilter

filters = MetadataFilters(filters=[
    ExactMatchFilter(key="department", value="legal")
])
retriever = index.as_retriever(filters=filters)

If your source docs are tagged inconsistently (Legal, legal-team, compliance), retrieval will look broken.

3) Top-k is too low

A similarity_top_k=1 setup can return the single closest node even when it’s only vaguely related.

retriever = index.as_retriever(similarity_top_k=1)

Try:

retriever = index.as_retriever(similarity_top_k=8)

4) Stale persisted storage

If you persist an old index and then update source files without re-ingesting, production keeps serving stale vectors.

index.storage_context.persist(persist_dir="./storage")
# source docs changed later, but persist dir was never rebuilt

Rebuild ingestion on content changes:

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
index.storage_context.persist(persist_dir="./storage")

How to Debug It

•
Inspect the top retrieved nodes
- •Print node text, score, and metadata.
- •If scores are high but text is irrelevant, your chunking or embeddings are off.
- •If nodes look relevant but filtered out later, check metadata filters.
•
Compare indexing and query configuration
- •Verify the same embed_model, splitter, and vector store are used in both paths.
- •In LlamaIndex apps this often means checking Settings.embed_model, ingestion pipelines, and persisted storage separately.
•
Test with a known exact-match query
- •Ask something that should map to one document only.
- •If "refund policy" can’t retrieve a doc containing "refund policy", your chunks are likely too large or stale.
•
Temporarily disable filters and rerankers
- •Remove MetadataFilters, rerankers, and hybrid logic.
- •If retrieval improves immediately, reintroduce components one by one until the failure returns.

Prevention

•Pin your embedding model and chunking config in code, not in someone’s notebook.
•Add an ingestion test that queries for known phrases and asserts the expected doc IDs appear in the top-k results.
•Persist versioned indexes so you can roll back when a bad re-ingestion ships to production.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit