How to Fix 'vector search returning irrelevant results during development' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-22
vector-search-returning-irrelevant-results-during-developmentllamaindexpython

When vector search returning irrelevant results during development shows up in LlamaIndex, it usually means your retrieval pipeline is technically working but semantically broken. The index is returning something, just not the chunks you expected, which usually points to bad chunking, mismatched embedding models, stale indexes, or querying the wrong store.

This is common during local development because people rebuild data, tweak loaders, or swap models without re-indexing. LlamaIndex won’t always throw a hard error like ValueError or KeyError; more often you’ll see “working” retrieval with garbage results from VectorStoreIndex.as_query_engine() or RetrieverQueryEngine.

The Most Common Cause

The #1 cause is indexing and querying with different embedding models or stale embeddings.

A classic failure mode is: you build the index with one embedding model, change the model later, and keep querying the old persisted vector store. LlamaIndex will happily retrieve vectors that no longer match your current semantic space.

Broken vs fixed pattern

BrokenFixed
Build index with one embed model, query with anotherUse the same embed model for indexing and querying
Reuse persisted storage after changing chunking/modelRebuild the index after config changes
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

# Index built yesterday with a different embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)

# Later in the same project someone changes this:
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

query_engine = index.as_query_engine()
response = query_engine.query("What is the refund policy?")
print(response)
# Symptom: returns irrelevant chunks even though the query "works"
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.embed_model = embed_model

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)

query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is the refund policy?")
print(response)

If you persist storage, this gets worse:

# BROKEN: stale persisted index
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

# embeddings changed since last persist, but storage wasn't rebuilt

Fix it by deleting the old persist directory and rebuilding after any embedding/model/chunking change.

Other Possible Causes

1) Bad chunking strategy

If chunks are too large, retrieval gets noisy. If they’re too small, context gets fragmented and similarity becomes weak.

from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=2048, chunk_overlap=20)  # often too large for dev docs

Try something more controlled:

splitter = SentenceSplitter(chunk_size=512, chunk_overlap=80)

If your docs are policy-heavy or structured, smaller chunks usually win.

2) Querying with low top-k

Sometimes the right chunk is in the index but not in the top results.

query_engine = index.as_query_engine(similarity_top_k=1)

Increase it:

query_engine = index.as_query_engine(similarity_top_k=5)

For debugging retrieval quality directly:

retriever = index.as_retriever(similarity_top_k=5)
nodes = retriever.retrieve("How do I reset my password?")
for node in nodes:
    print(node.score, node.node.get_content())

3) Wrong document loader or bad source text

If your loader strips tables, headers, or key fields, embeddings will be weak from the start.

from llama_index.core import SimpleDirectoryReader

docs = SimpleDirectoryReader("./pdfs").load_data()

That may be fine for plain text files but weak for PDFs with layout-heavy content. Use a loader that preserves structure better if your source data depends on it.

4) Mixing indexes or vector stores accidentally

You may think you’re querying one dataset but actually hit another VectorStoreIndex.

# Two indexes created in different runs/directories
index_a = VectorStoreIndex.from_documents(docs_a)
index_b = VectorStoreIndex.from_documents(docs_b)

query_engine = index_b.as_query_engine()

Make sure your app wiring points to the correct StorageContext, persist directory, and collection name if using an external vector DB like Chroma or Pinecone.

How to Debug It

  1. Inspect retrieved nodes directly Don’t start with answer quality. Start with what got retrieved.

    retriever = index.as_retriever(similarity_top_k=5)
    results = retriever.retrieve("your test query")
    for r in results:
        print(r.score)
        print(r.node.get_content()[:500])
    

    If these chunks are irrelevant, your problem is retrieval setup, not synthesis.

  2. Verify embedding model consistency Print both indexing-time and query-time embedding config.

    • Same model name?
    • Same dimensions?
    • Same provider version?

    If you changed models since persisting data, rebuild everything.

  3. Check chunk size and overlap If chunks are huge blobs of unrelated text, similarity scores get muddy.

    • Try chunk_size=256 to 512
    • Try chunk_overlap=50 to 100
    • Re-index after changing parser settings
  4. Clear storage and rebuild If you use persistence:

    • delete ./storage
    • delete vector DB test collections
    • re-run ingestion from scratch

    Stale indices are one of the most common causes of “it worked yesterday”.

Prevention

  • Keep embedding config in one place using Settings.embed_model, and never change it without rebuilding indexes.
  • Treat chunking as part of schema design. If you change SentenceSplitter, re-index.
  • In dev environments, use a clean persist directory per experiment so stale vectors don’t survive config changes.
  • Add a retrieval smoke test that prints top-3 retrieved chunks for a known query before you trust answer generation.

If you’re still seeing irrelevant results after fixing embeddings and chunking, stop looking at the LLM layer. The problem is almost always in ingestion or retrieval configuration long before RetrieverQueryEngine gets involved.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides