How to Fix 'vector search returning irrelevant results during development' in CrewAI (Python)
If your CrewAI agent is returning irrelevant vector search results during development, the retrieval layer is usually working — just not with the data you think it is. In practice, this shows up when your VectorStore has stale embeddings, your chunks are too large, or your query text doesn’t match the embedding space you built.
The symptom is usually obvious: the agent answers confidently, but cites the wrong document, pulls unrelated chunks, or ignores the most relevant file entirely. In CrewAI projects, this often happens around RAGTool, Embedchain, Chroma, or custom retrievers wired into an Agent.
The Most Common Cause
The #1 cause is embedding mismatch: you indexed documents with one embedding model, then queried with another, or you changed chunking/settings without rebuilding the store. The vector DB still returns nearest neighbors, but “nearest” is now meaningless.
Here’s the broken pattern versus the fixed pattern:
| Broken pattern | Right pattern |
|---|---|
| Reuse an old persisted store after changing embedding model | Rebuild the store whenever embeddings change |
| Index with one model, query with another | Use the same embedding model for both indexing and retrieval |
| Change chunk size but keep old vectors | Clear persisted data and re-embed |
# BROKEN: stale persisted vectors + changed embedding model
from crewai import Agent, Task, Crew
from crewai_tools import RagTool
rag_tool = RagTool(
knowledge_source="docs/",
vector_db_path="./chroma_db", # existing DB from a previous run
embedder="openai/text-embedding-3-large", # changed from older model
)
agent = Agent(
role="Support Analyst",
goal="Answer from docs",
tools=[rag_tool],
)
task = Task(
description="What does our refund policy say?",
expected_output="A precise answer with citations",
agent=agent,
)
# FIXED: rebuild index with a consistent embedding setup
import shutil
from crewai import Agent, Task, Crew
from crewai_tools import RagTool
shutil.rmtree("./chroma_db", ignore_errors=True) # clear stale index before re-indexing
rag_tool = RagTool(
knowledge_source="docs/",
vector_db_path="./chroma_db",
embedder="openai/text-embedding-3-large",
)
agent = Agent(
role="Support Analyst",
goal="Answer from docs",
tools=[rag_tool],
)
task = Task(
description="What does our refund policy say?",
expected_output="A precise answer with citations",
agent=agent,
)
If you’re using Chroma directly under CrewAI, the same rule applies:
# BROKEN
collection = chroma_client.get_collection("policy_docs") # old vectors still there
# FIXED
chroma_client.delete_collection("policy_docs")
collection = chroma_client.create_collection("policy_docs")
Other Possible Causes
1) Your chunks are too large or too small
If chunks are huge, retrieval gets noisy. If they’re tiny, you lose context and similarity scores become weak.
# Too large: one chunk contains multiple topics
chunk_size = 4000
chunk_overlap = 0
# Better starting point for policy/docs-style content
chunk_size = 800
chunk_overlap = 120
2) Your query is vague
Vector search works best when the query matches document language. “Tell me about it” will retrieve junk; “refund eligibility for annual subscriptions” usually won’t.
# BROKEN query
query = "What about that policy?"
# FIXED query
query = "What is the refund eligibility for annual subscription cancellations?"
In CrewAI terms, this often means your Task.description is too generic.
task = Task(
description="Find relevant info in the docs.",
expected_output="Answer",
agent=agent,
)
Use a concrete prompt instead:
task = Task(
description="Find the refund policy section covering annual subscription cancellations and quote the exact eligibility rules.",
expected_output="Answer with source references",
agent=agent,
)
3) You indexed raw files without cleaning noise
PDF headers, footers, nav menus, and OCR garbage poison embeddings. The retriever then matches repeated boilerplate instead of actual content.
# BAD: indexing raw extracted text with repeated headers/footers intact
documents = load_pdf_text("policy.pdf")
# BETTER: strip boilerplate before embedding
documents = clean_document_text(load_pdf_text("policy.pdf"))
If you see phrases like “Page 1 of 42” or repeated menu items in retrieved chunks, this is likely your problem.
4) You are using metadata filters incorrectly
Bad filters can silently narrow results to irrelevant documents or exclude the right ones entirely.
# BROKEN: filter doesn't match stored metadata keys/values
results = collection.query(
query_texts=["refund policy"],
where={"dept": "claims"} # stored key may actually be "department"
)
# FIXED: match exact metadata schema used at ingestion time
results = collection.query(
query_texts=["refund policy"],
where={"department": "claims"}
)
With CrewAI tools wrapped around a vector DB, this often appears as empty context followed by generic LLM hallucination.
How to Debug It
- •Print the top retrieved chunks
- •Don’t trust the agent’s final answer.
- •Inspect what your retriever actually returned before generation.
results = collection.query(query_texts=["refund eligibility"], n_results=5)
print(results["documents"])
print(results["metadatas"])
print(results["distances"])
- •
Verify embeddings are from the same model
- •Check what was used during ingestion and what’s used at query time.
- •If they differ, rebuild everything.
- •
Test with an exact phrase from a known document
- •Copy a sentence from a source file and search it.
- •If that still returns irrelevant results, your index or chunking is broken.
- •
Delete persistence and re-index
- •This removes stale vectors from previous experiments.
- •If results improve immediately after clearing storage, you found the issue.
Prevention
- •Keep embedding model, chunking strategy, and persistence lifecycle versioned together.
- •Add a smoke test that queries known phrases and asserts top-1 retrieval hits the expected document.
- •Log retrieved chunks in development so you can see bad context before it reaches the
Agent.
If you’re building CrewAI systems for regulated environments like banking or insurance, treat retrieval as part of your test surface. Most “irrelevant results” bugs are not LLM bugs — they’re indexing bugs hiding behind a polished answer.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit