How to Fix 'embedding dimension mismatch during development' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-22
embedding-dimension-mismatch-during-developmentcrewaipython

What the error means

embedding dimension mismatch during development usually means your vector store is expecting embeddings of one size, but CrewAI is sending vectors of a different size. This shows up when you change embedding providers, switch models, or reuse an old persisted index after changing configuration.

In CrewAI projects, this often happens when OpenAIEmbeddings, HuggingFaceEmbeddings, or another embedding class is swapped without rebuilding the store. The symptom is usually a runtime failure from the underlying vector DB, not from CrewAI itself.

The Most Common Cause

The #1 cause is mixing embeddings from different models in the same vector store.

A common pattern is: you built your Chroma/FAISS/Pinecone index with one embedding model, then later changed the model in your CrewAI app and kept using the same persisted data directory or namespace.

Broken vs fixed

Broken patternFixed pattern
Reusing an old index built with text-embedding-ada-002 while querying with text-embedding-3-smallRebuild the index or keep the same embedding model for both indexing and querying
Persisted store stays on disk across code changesDelete/recreate the store when changing embedding dimensions
# BROKEN: existing vector store was built with a different embedding dimension
from crewai import Agent, Task, Crew
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

db = Chroma(
    persist_directory="./chroma_db",
    collection_name="support_docs",
    embedding_function=embeddings,
)

# Later, retrieval fails with errors like:
# "InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 3072"
# or:
# "embedding dimension mismatch"
# FIXED: rebuild the collection when changing embedding model
import shutil
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

shutil.rmtree("./chroma_db", ignore_errors=True)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

db = Chroma.from_texts(
    texts=["policy renewal", "claims handling", "fraud detection"],
    embedding=embeddings,
    persist_directory="./chroma_db",
    collection_name="support_docs",
)

If you use CrewAI tools that wrap retrieval, make sure every component uses the same embedding class and model. The mismatch usually enters through a tool like SerperDevTool, RAGTool, or a custom retriever wired into an agent.

Other Possible Causes

1) Mixing local and cloud embeddings

You may index documents with HuggingFaceEmbeddings locally and query with OpenAI embeddings later. Those models almost never produce the same vector size.

# BROKEN
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_openai import OpenAIEmbeddings

index_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Fix: use one embedding provider per index.

# FIXED
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

2) Persisted vector store from an older run

If you changed your embedding model but kept persist_directory, old vectors remain on disk.

# CONFIG SNIPPET
CHROMA_DIR=./chroma_db   # old vectors still there after model change

Fix options:

  • delete the directory before rebuilding
  • version your collection names by embedding model
collection_name = "support_docs_v1_minilm"

3) Wrong embedding class passed into a CrewAI tool

A custom tool might accept an embeddings object, but you accidentally pass a text generation client instead.

# BROKEN
from openai import OpenAI

client = OpenAI()  # not an embeddings class
retriever = MyVectorTool(embeddings=client)

Fix: pass an embeddings implementation, not a chat/completions client.

# FIXED
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
retriever = MyVectorTool(embeddings=embeddings)

4) Changing model dimensions without reindexing

Some models let you choose output dimensions. If you changed that setting mid-project, your store breaks immediately.

# BROKEN
embeddings_v1 = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=3072)
embeddings_v2 = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=1536)

Fix: reindex everything whenever dimensions change.

How to Debug It

  1. Print the embedding dimension at runtime

    Check what your current embedder returns before it hits the vector DB.

    vec = embeddings.embed_query("test")
    print(len(vec))
    
  2. Inspect the stored collection dimension

    For Chroma/Pinecone/FAISS, confirm what dimension the index was created with. If it differs from len(vec), that’s your bug.

  3. Search for multiple embedding classes

    Grep your codebase for:

    • OpenAIEmbeddings
    • HuggingFaceEmbeddings
    • OllamaEmbeddings
    • custom wrappers passed into CrewAI tools

    You want one source of truth for indexing and querying.

  4. Delete and rebuild as a test

    If removing the persisted store fixes it, you’ve confirmed stale vectors are involved.

    This is especially common when you see errors like:

    • InvalidDimensionException
    • Collection dimensionality does not match
    • Embedding dimension mismatch

Prevention

  • Use one embedding model per vector index, and pin it in config.
  • Version your collections by model name and dimension.
  • Rebuild indexes whenever you change providers, models, or output dimensions.
  • In development, add a startup check that compares len(embed_query("test")) against stored index metadata before agents run.

If you’re building CrewAI agents for RAG workflows, treat embeddings like schema. Change them without migration discipline and your retrieval layer will break exactly like a database schema mismatch.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides