How to Fix 'vector search returning irrelevant results when scaling' in CrewAI (Python)
When CrewAI starts returning irrelevant vector search results as your corpus grows, the usual meaning is simple: your retrieval layer is no longer matching the right chunks to the right queries. This shows up after you move from a few documents to hundreds or thousands, or when you switch embedding models, chunking rules, or vector stores without retuning retrieval.
In practice, the issue is almost never CrewAI itself. It’s usually bad chunking, stale embeddings, weak metadata filtering, or a retriever configured for small datasets.
The Most Common Cause
The #1 cause is chunking that was fine for a small dataset but breaks down at scale. If your chunks are too large, too small, or split without overlap, similarity search starts pulling in semantically noisy matches.
Here’s the broken pattern I see most often with CrewAI + RAGTool + a vector store:
| Broken pattern | Fixed pattern |
|---|---|
| Large chunks with no overlap | Smaller chunks with overlap |
| Embeddings generated once, then content changes | Re-embed after every document update |
| Generic retrieval settings | Tuned k, metadata filters, and chunk size |
# BROKEN
from crewai import Agent, Task, Crew
from crewai_tools import RAGTool
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=2000,
chunk_overlap=0
)
docs = splitter.split_text(open("policy.txt").read())
rag = RAGTool(
knowledge_sources=docs,
# default retrieval settings
)
agent = Agent(
role="Insurance Analyst",
goal="Answer policy questions",
tools=[rag],
)
task = Task(
description="What does the cancellation clause say?",
agent=agent,
)
# FIXED
from crewai import Agent, Task, Crew
from crewai_tools import RAGTool
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=120
)
docs = splitter.split_text(open("policy.txt").read())
rag = RAGTool(
knowledge_sources=docs,
# tune retrieval if your tool/vector store supports it
top_k=5,
)
agent = Agent(
role="Insurance Analyst",
goal="Answer policy questions using only retrieved context",
tools=[rag],
)
task = Task(
description="What does the cancellation clause say?",
agent=agent,
)
Why this fails at scale:
- •A 2,000-character chunk often contains multiple topics.
- •Similarity search grabs the “closest” topic inside the blob, not the exact answer.
- •With more documents, false positives increase because everything looks vaguely similar.
If you’re using RecursiveCharacterTextSplitter, start around 600–1000 characters and add overlap. Then measure retrieval quality before changing anything else.
Other Possible Causes
1. Stale embeddings after document updates
If you update source files but don’t re-index them, CrewAI will retrieve old vectors.
# BROKEN: source changed but index was not rebuilt
vector_store.add_documents(new_docs) # old docs still dominate retrieval
# FIXED: rebuild or upsert consistently
vector_store.delete_collection()
vector_store.add_documents(all_current_docs)
If you’re using Chroma, Pinecone, Weaviate, or Qdrant, make sure updates are idempotent and versioned.
2. Wrong embedding model for the domain
A general-purpose embedding model can be weak on insurance policy language, banking product terms, or internal jargon.
# BROKEN: generic embeddings on domain-specific text
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# FIXED: use a stronger model and keep it consistent
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
Don’t switch models without rebuilding the index. Vector spaces are not interchangeable.
3. Missing metadata filters
If you mix product docs, legal docs, and support articles in one index without filters, retrieval gets noisy fast.
# BROKEN: searching across everything
results = retriever.get_relevant_documents("premium refund terms")
# FIXED: filter by document type / tenant / product line
results = retriever.get_relevant_documents(
"premium refund terms",
filter={"doc_type": "policy", "product": "life_insurance"}
)
In multi-tenant systems this is mandatory. Without filters, one customer’s content can pollute another customer’s results.
4. Retriever settings tuned too aggressively
High k can flood the agent with marginal context. Too low k can miss the answer entirely.
# BROKEN: too many noisy chunks returned
retriever.search_kwargs = {"k": 20}
# FIXED: start small and test recall/precision tradeoff
retriever.search_kwargs = {"k": 4}
If your vector store supports MMR (max marginal relevance), it can help reduce duplicate-looking chunks:
retriever.search_type = "mmr"
retriever.search_kwargs = {"k": 4, "fetch_k": 20}
How to Debug It
- •
Inspect what actually got retrieved Print raw chunks before they reach the agent. If the top result is vaguely related instead of directly relevant, this is a retrieval problem — not an LLM problem.
docs = retriever.get_relevant_documents("cancellation clause") for d in docs: print(d.metadata) print(d.page_content[:300]) print("---") - •
Check whether embeddings were rebuilt If documents changed recently and results got worse immediately after scale-up, verify indexing timestamps and document hashes.
- •
Test with one known query per document Build a tiny evaluation set:
- •query: “What is the cancellation period?”
- •expected doc id:
policy_17
If retrieval fails on known examples, tune chunking and filters before touching prompts.
- •
Compare results across vector stores or models If FAISS works but Pinecone doesn’t — or vice versa — look at normalization settings, metadata handling, and distance metrics.
Also check whether your store uses cosine similarity while your embeddings expect dot product behavior.
Prevention
- •
Use chunk sizes that match the content type:
- •policies/contracts: smaller chunks with overlap
- •FAQs/short answers: slightly larger chunks are fine
- •
Rebuild or upsert embeddings on every content change.
- •stale vectors are one of the fastest ways to get irrelevant matches
- •
Add metadata from day one.
- •
tenant_id - •
doc_type - •
product - •
version
- •
- •
Keep a small retrieval test suite.
- •run it before deploying any CrewAI change that touches ingestion or embeddings
If you want this to stay stable in production, treat retrieval like code: version it, test it, and monitor it. The moment your corpus grows past a few hundred documents without evaluation gates, irrelevant results stop being occasional noise and become your default failure mode.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit