How to Fix 'embedding dimension mismatch during development' in CrewAI (Python)
What the error means
embedding dimension mismatch during development usually means your vector store is expecting embeddings of one size, but CrewAI is sending vectors of a different size. This shows up when you change embedding providers, switch models, or reuse an old persisted index after changing configuration.
In CrewAI projects, this often happens when OpenAIEmbeddings, HuggingFaceEmbeddings, or another embedding class is swapped without rebuilding the store. The symptom is usually a runtime failure from the underlying vector DB, not from CrewAI itself.
The Most Common Cause
The #1 cause is mixing embeddings from different models in the same vector store.
A common pattern is: you built your Chroma/FAISS/Pinecone index with one embedding model, then later changed the model in your CrewAI app and kept using the same persisted data directory or namespace.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
Reusing an old index built with text-embedding-ada-002 while querying with text-embedding-3-small | Rebuild the index or keep the same embedding model for both indexing and querying |
| Persisted store stays on disk across code changes | Delete/recreate the store when changing embedding dimensions |
# BROKEN: existing vector store was built with a different embedding dimension
from crewai import Agent, Task, Crew
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = Chroma(
persist_directory="./chroma_db",
collection_name="support_docs",
embedding_function=embeddings,
)
# Later, retrieval fails with errors like:
# "InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 3072"
# or:
# "embedding dimension mismatch"
# FIXED: rebuild the collection when changing embedding model
import shutil
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
shutil.rmtree("./chroma_db", ignore_errors=True)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = Chroma.from_texts(
texts=["policy renewal", "claims handling", "fraud detection"],
embedding=embeddings,
persist_directory="./chroma_db",
collection_name="support_docs",
)
If you use CrewAI tools that wrap retrieval, make sure every component uses the same embedding class and model. The mismatch usually enters through a tool like SerperDevTool, RAGTool, or a custom retriever wired into an agent.
Other Possible Causes
1) Mixing local and cloud embeddings
You may index documents with HuggingFaceEmbeddings locally and query with OpenAI embeddings later. Those models almost never produce the same vector size.
# BROKEN
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_openai import OpenAIEmbeddings
index_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
Fix: use one embedding provider per index.
# FIXED
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
2) Persisted vector store from an older run
If you changed your embedding model but kept persist_directory, old vectors remain on disk.
# CONFIG SNIPPET
CHROMA_DIR=./chroma_db # old vectors still there after model change
Fix options:
- •delete the directory before rebuilding
- •version your collection names by embedding model
collection_name = "support_docs_v1_minilm"
3) Wrong embedding class passed into a CrewAI tool
A custom tool might accept an embeddings object, but you accidentally pass a text generation client instead.
# BROKEN
from openai import OpenAI
client = OpenAI() # not an embeddings class
retriever = MyVectorTool(embeddings=client)
Fix: pass an embeddings implementation, not a chat/completions client.
# FIXED
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
retriever = MyVectorTool(embeddings=embeddings)
4) Changing model dimensions without reindexing
Some models let you choose output dimensions. If you changed that setting mid-project, your store breaks immediately.
# BROKEN
embeddings_v1 = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=3072)
embeddings_v2 = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=1536)
Fix: reindex everything whenever dimensions change.
How to Debug It
- •
Print the embedding dimension at runtime
Check what your current embedder returns before it hits the vector DB.
vec = embeddings.embed_query("test") print(len(vec)) - •
Inspect the stored collection dimension
For Chroma/Pinecone/FAISS, confirm what dimension the index was created with. If it differs from
len(vec), that’s your bug. - •
Search for multiple embedding classes
Grep your codebase for:
- •
OpenAIEmbeddings - •
HuggingFaceEmbeddings - •
OllamaEmbeddings - •custom wrappers passed into CrewAI tools
You want one source of truth for indexing and querying.
- •
- •
Delete and rebuild as a test
If removing the persisted store fixes it, you’ve confirmed stale vectors are involved.
This is especially common when you see errors like:
- •
InvalidDimensionException - •
Collection dimensionality does not match - •
Embedding dimension mismatch
- •
Prevention
- •Use one embedding model per vector index, and pin it in config.
- •Version your collections by model name and dimension.
- •Rebuild indexes whenever you change providers, models, or output dimensions.
- •In development, add a startup check that compares
len(embed_query("test"))against stored index metadata before agents run.
If you’re building CrewAI agents for RAG workflows, treat embeddings like schema. Change them without migration discipline and your retrieval layer will break exactly like a database schema mismatch.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit