How to Fix 'embedding dimension mismatch when scaling' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-22
embedding-dimension-mismatch-when-scalingcrewaipython

What the error means

embedding dimension mismatch when scaling usually means your vector store was built with embeddings of one size, then you tried to insert or query vectors of a different size. In CrewAI, this often shows up when you change embedding models, swap providers, or reuse an existing Chroma/FAISS/Pinecone index after changing the embedding configuration.

The failure typically appears during memory setup, knowledge ingestion, or tool-backed retrieval. You’ll see a stack trace that includes CrewAI classes like Crew, Knowledge, EmbedchainConfig, or a vector DB client such as chromadb.errors.InvalidDimensionException.

The Most Common Cause

The #1 cause is mixing embedding models across runs.

A common pattern is:

  • build the collection with one model
  • later switch to another model with a different output dimension
  • keep the same persisted vector store path or collection name

That breaks because vector databases require a fixed embedding width per collection.

Broken vs fixed

Broken patternFixed pattern
Reuses old persisted index after changing embedding modelRebuilds the index or uses a fresh collection path
Embedding model changes from 1536-dim to 3072-dimKeeps embedding model consistent for that store
# BROKEN: old Chroma store created with one embedding model,
# then reused after switching models.

from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
from crewai.memory import LongTermMemory

# First run may have used OpenAI text-embedding-3-small (1536 dims)
# Later you switched to text-embedding-3-large (3072 dims)

knowledge = StringKnowledgeSource(
    content="Company policy: reimburse travel within 7 days."
)

agent = Agent(
    role="Policy Assistant",
    goal="Answer policy questions",
    backstory="You know internal policy.",
    knowledge_sources=[knowledge],
)

crew = Crew(agents=[agent], tasks=[Task(description="Summarize policy", agent=agent)])

result = crew.kickoff()
# FIXED: keep the same embedding model for the same persisted store,
# or delete/rebuild the store when changing models.

from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

knowledge = StringKnowledgeSource(
    content="Company policy: reimburse travel within 7 days."
)

agent = Agent(
    role="Policy Assistant",
    goal="Answer policy questions",
    backstory="You know internal policy.",
    knowledge_sources=[knowledge],
)

crew = Crew(agents=[agent], tasks=[Task(description="Summarize policy", agent=agent)])

result = crew.kickoff()

# If you changed embedding providers/models:
# 1) delete the old vector DB directory / collection
# 2) recreate it from scratch
# 3) rerun ingestion with the new model only

If you are using a persisted Chroma directory, this is the first thing to check:

from chromadb import PersistentClient

client = PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("company_docs")

If ./chroma_db already contains vectors from another embedding model, recreate it.

Other Possible Causes

1) You changed providers but kept the same collection

OpenAI embeddings and local embeddings like sentence-transformers do not share dimensions. A local model might produce 384 or 768 dimensions, while an OpenAI model produces 1536 or 3072.

# BROKEN
embedder_config = {
    "provider": "openai",
    "model": "text-embedding-3-large",
}

# Later in another environment:
embedder_config = {
    "provider": "huggingface",
    "model": "all-MiniLM-L6-v2",
}

Fix: use a separate collection per provider/model pair.

collection_name = "company_docs_openai_3072"
# not just "company_docs"

2) Your app loads stale cached embeddings

Some CrewAI setups cache knowledge or memory on disk. If your source documents stay the same but embeddings were generated by an older model, queries can fail when new vectors are compared against old ones.

# Example cleanup approach
import shutil

shutil.rmtree("./chroma_db", ignore_errors=True)
shutil.rmtree("./.crew", ignore_errors=True)

Then re-ingest everything with one embedding configuration.

3) One agent uses a different embedder than another

This happens when different agents or tools initialize their own retrieval layer independently. One agent writes vectors with one dimension; another tries to read them using another embedder.

# BROKEN: inconsistent embedder setup across components

research_agent = Agent(..., knowledge_sources=[research_knowledge])
support_agent = Agent(..., knowledge_sources=[support_knowledge])

Fix: centralize embedder configuration and reuse it everywhere.

EMBEDDER_MODEL = "text-embedding-3-small"

# Use one config for all knowledge sources and memory stores in the app.

4) The underlying vector DB schema already exists

With Pinecone, Qdrant, Weaviate, and Chroma, collections/indexes often lock in dimensionality at creation time. If you try to upsert vectors of another size into an existing index, you get dimension errors.

# Example: Pinecone index created for one dimension only.
# You cannot upsert a different-sized vector into it.

Fix:

  • delete and recreate the index/collection
  • make sure index dimension matches your embedder output exactly

How to Debug It

  1. Print your embedding model name everywhere

    • Log the exact provider and model used during ingestion and query time.
    • Look for drift between runs.
  2. Check vector dimensions directly

    • Generate one sample embedding and inspect its length.
    • Compare it with what your DB expects.
embedding = embedder.embed_query("test")
print(len(embedding))
  1. Inspect the persisted store

    • If you use Chroma locally, delete the directory and rerun.
    • If the error disappears, you had stale vectors from another dimension.
  2. Verify all agents share one retrieval config

    • Search your codebase for multiple embedder initializations.
    • Make sure every Agent, knowledge source, and memory component uses the same configuration.

A real traceback often looks like this:

chromadb.errors.InvalidDimensionException: Embedding dimension mismatch when scaling.
Expected dimension: 1536
Got dimension: 3072

During handling of the above exception:
ValueError: Failed to add documents to Knowledge store in CrewAI

That message tells you exactly where to look: stored vectors and current embeddings do not match.

Prevention

  • Use one embedding model per vector store. If you change models, create a new collection name or wipe the old store.
  • Pin your embedder config in code and environment variables so ingestion and querying always use the same settings.
  • Add a startup check that validates embedding length before writing to your DB:
expected_dim = 1536
actual_dim = len(embedder.embed_query("dimension check"))
assert actual_dim == expected_dim

If you’re building CrewAI systems for production, treat embeddings like schema migrations. Changing them without rebuilding storage is how this error gets into your logs.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides