How to Fix 'embedding dimension mismatch when scaling' in CrewAI (Python)
What the error means
embedding dimension mismatch when scaling usually means your vector store was built with embeddings of one size, then you tried to insert or query vectors of a different size. In CrewAI, this often shows up when you change embedding models, swap providers, or reuse an existing Chroma/FAISS/Pinecone index after changing the embedding configuration.
The failure typically appears during memory setup, knowledge ingestion, or tool-backed retrieval. You’ll see a stack trace that includes CrewAI classes like Crew, Knowledge, EmbedchainConfig, or a vector DB client such as chromadb.errors.InvalidDimensionException.
The Most Common Cause
The #1 cause is mixing embedding models across runs.
A common pattern is:
- •build the collection with one model
- •later switch to another model with a different output dimension
- •keep the same persisted vector store path or collection name
That breaks because vector databases require a fixed embedding width per collection.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| Reuses old persisted index after changing embedding model | Rebuilds the index or uses a fresh collection path |
| Embedding model changes from 1536-dim to 3072-dim | Keeps embedding model consistent for that store |
# BROKEN: old Chroma store created with one embedding model,
# then reused after switching models.
from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
from crewai.memory import LongTermMemory
# First run may have used OpenAI text-embedding-3-small (1536 dims)
# Later you switched to text-embedding-3-large (3072 dims)
knowledge = StringKnowledgeSource(
content="Company policy: reimburse travel within 7 days."
)
agent = Agent(
role="Policy Assistant",
goal="Answer policy questions",
backstory="You know internal policy.",
knowledge_sources=[knowledge],
)
crew = Crew(agents=[agent], tasks=[Task(description="Summarize policy", agent=agent)])
result = crew.kickoff()
# FIXED: keep the same embedding model for the same persisted store,
# or delete/rebuild the store when changing models.
from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
knowledge = StringKnowledgeSource(
content="Company policy: reimburse travel within 7 days."
)
agent = Agent(
role="Policy Assistant",
goal="Answer policy questions",
backstory="You know internal policy.",
knowledge_sources=[knowledge],
)
crew = Crew(agents=[agent], tasks=[Task(description="Summarize policy", agent=agent)])
result = crew.kickoff()
# If you changed embedding providers/models:
# 1) delete the old vector DB directory / collection
# 2) recreate it from scratch
# 3) rerun ingestion with the new model only
If you are using a persisted Chroma directory, this is the first thing to check:
from chromadb import PersistentClient
client = PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("company_docs")
If ./chroma_db already contains vectors from another embedding model, recreate it.
Other Possible Causes
1) You changed providers but kept the same collection
OpenAI embeddings and local embeddings like sentence-transformers do not share dimensions. A local model might produce 384 or 768 dimensions, while an OpenAI model produces 1536 or 3072.
# BROKEN
embedder_config = {
"provider": "openai",
"model": "text-embedding-3-large",
}
# Later in another environment:
embedder_config = {
"provider": "huggingface",
"model": "all-MiniLM-L6-v2",
}
Fix: use a separate collection per provider/model pair.
collection_name = "company_docs_openai_3072"
# not just "company_docs"
2) Your app loads stale cached embeddings
Some CrewAI setups cache knowledge or memory on disk. If your source documents stay the same but embeddings were generated by an older model, queries can fail when new vectors are compared against old ones.
# Example cleanup approach
import shutil
shutil.rmtree("./chroma_db", ignore_errors=True)
shutil.rmtree("./.crew", ignore_errors=True)
Then re-ingest everything with one embedding configuration.
3) One agent uses a different embedder than another
This happens when different agents or tools initialize their own retrieval layer independently. One agent writes vectors with one dimension; another tries to read them using another embedder.
# BROKEN: inconsistent embedder setup across components
research_agent = Agent(..., knowledge_sources=[research_knowledge])
support_agent = Agent(..., knowledge_sources=[support_knowledge])
Fix: centralize embedder configuration and reuse it everywhere.
EMBEDDER_MODEL = "text-embedding-3-small"
# Use one config for all knowledge sources and memory stores in the app.
4) The underlying vector DB schema already exists
With Pinecone, Qdrant, Weaviate, and Chroma, collections/indexes often lock in dimensionality at creation time. If you try to upsert vectors of another size into an existing index, you get dimension errors.
# Example: Pinecone index created for one dimension only.
# You cannot upsert a different-sized vector into it.
Fix:
- •delete and recreate the index/collection
- •make sure index dimension matches your embedder output exactly
How to Debug It
- •
Print your embedding model name everywhere
- •Log the exact provider and model used during ingestion and query time.
- •Look for drift between runs.
- •
Check vector dimensions directly
- •Generate one sample embedding and inspect its length.
- •Compare it with what your DB expects.
embedding = embedder.embed_query("test")
print(len(embedding))
- •
Inspect the persisted store
- •If you use Chroma locally, delete the directory and rerun.
- •If the error disappears, you had stale vectors from another dimension.
- •
Verify all agents share one retrieval config
- •Search your codebase for multiple embedder initializations.
- •Make sure every
Agent, knowledge source, and memory component uses the same configuration.
A real traceback often looks like this:
chromadb.errors.InvalidDimensionException: Embedding dimension mismatch when scaling.
Expected dimension: 1536
Got dimension: 3072
During handling of the above exception:
ValueError: Failed to add documents to Knowledge store in CrewAI
That message tells you exactly where to look: stored vectors and current embeddings do not match.
Prevention
- •Use one embedding model per vector store. If you change models, create a new collection name or wipe the old store.
- •Pin your embedder config in code and environment variables so ingestion and querying always use the same settings.
- •Add a startup check that validates embedding length before writing to your DB:
expected_dim = 1536
actual_dim = len(embedder.embed_query("dimension check"))
assert actual_dim == expected_dim
If you’re building CrewAI systems for production, treat embeddings like schema migrations. Changing them without rebuilding storage is how this error gets into your logs.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit