How to Fix 'embedding dimension mismatch' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-22
embedding-dimension-mismatchcrewaipython

What this error means

embedding dimension mismatch usually means you stored vectors with one embedding model, then tried to query them with a different model that produces a different vector size. In CrewAI projects, this typically shows up when your agent tools, memory, or vector store are wired to mismatched embedding providers.

The stack trace often comes from the vector DB layer, not CrewAI itself. You’ll see something like ValueError: Embedding dimension mismatch or a Chroma/Pinecone/Qdrant error complaining that the query vector length does not match the collection schema.

The Most Common Cause

The #1 cause is changing embedding models between indexing and retrieval.

Example: you indexed documents with text-embedding-3-small and later queried with text-embedding-3-large, or you switched from OpenAI embeddings to sentence-transformers. Those models do not always produce the same dimension.

Broken vs fixed pattern

BrokenFixed
Use one embedding model to ingest, another to queryUse the same embedding model everywhere
Create collection once, then swap models laterRebuild the collection when changing models
Let defaults drift across toolsPin embeddings in one shared config
# BROKEN: ingestion and retrieval use different embedding models

from crewai import Agent, Task, Crew
from crewai_tools import PDFSearchTool
from langchain_openai import OpenAIEmbeddings

# Ingested earlier with text-embedding-3-small (1536 dims)
search_tool = PDFSearchTool(
    pdf="policy.pdf",
    config={
        "embedding_model": "text-embedding-3-large"  # 3072 dims
    }
)

agent = Agent(
    role="Policy Analyst",
    goal="Answer questions from the policy document",
    tools=[search_tool],
    verbose=True,
)

task = Task(
    description="Find the cancellation clause.",
    expected_output="A short answer with citation.",
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task])
crew.kickoff()
# FIXED: use one embedding model consistently

from crewai import Agent, Task, Crew
from crewai_tools import PDFSearchTool

shared_embedding_config = {
    "embedding_model": "text-embedding-3-small"
}

search_tool = PDFSearchTool(
    pdf="policy.pdf",
    config=shared_embedding_config
)

agent = Agent(
    role="Policy Analyst",
    goal="Answer questions from the policy document",
    tools=[search_tool],
    verbose=True,
)

task = Task(
    description="Find the cancellation clause.",
    expected_output="A short answer with citation.",
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task])
crew.kickoff()

If you already created the vector store with a different model, you must reindex it. Changing only the query-side embedding config is not enough.

Other Possible Causes

1) Your vector store collection already exists with old dimensions

This is common with Chroma, Qdrant, Pinecone, or Weaviate. The collection schema was created using one embedding size, and now your app is trying to insert/query another.

# Example: existing Chroma collection was built with 1536-dim vectors
# Now you're trying to insert 3072-dim vectors.

collection_name = "crew_docs"

Fix:

  • Delete and recreate the collection
  • Or create a new collection name for the new embedding model
# safer: version your collections by embedding model
collection_name = "crew_docs_openai_1536"

2) You mixed providers in different parts of the app

One tool uses OpenAI embeddings, another uses Hugging Face or local OllamaEmbeddings. CrewAI won’t normalize dimensions for you.

# BAD: mixed providers
retriever_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
memory_embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Fix:

  • Use one provider per index/collection
  • If you need multiple providers, isolate them into separate stores

3) Your chunking/indexing pipeline changed but the store was not rebuilt

You may have changed preprocessing code and assumed only content changed. If your pipeline now indexes different documents into an old index, stale vectors remain in place.

# old index still contains prior vectors
vector_store.add_documents(new_docs)
# but underlying collection has incompatible schema/dimensions

Fix:

  • Clear the store before re-ingestion
  • Rebuild from source data after any embedding config change

4) A tool defaults to a different embedding model than your main app

Some CrewAI tools create their own internal retriever settings. If you pass no explicit embedding config, they may fall back to defaults that differ from your app-level config.

# risky: implicit defaults inside tool internals
tool = SomeRAGTool(pdf_path="claims.pdf")

Fix:

  • Pass explicit embeddings into every tool that builds or queries a vector index
  • Don’t rely on defaults in production code

How to Debug It

  1. Print the embedding model and dimension at every boundary

    • Log which model ingests data
    • Log which model queries data
    • Check whether both sides match exactly
  2. Inspect the stored collection metadata

    • For Chroma/Qdrant/Pinecone/Weaviate, verify the existing vector size
    • If the store was created earlier, assume it may be stale until proven otherwise
  3. Reproduce with a clean empty index

    • Create a brand-new collection name
    • Run ingestion and retrieval in one process
    • If it works cleanly, your old store is the problem
  4. Check your stack trace for the failing layer

    • If it mentions Chroma, Pinecone, QdrantClient, or WeaviateClient, it’s almost always storage-side mismatch
    • If it fails inside a CrewAI tool wrapper like PDFSearchTool, inspect that tool’s embedding config first

Prevention

  • Pin one embedding model per project and centralize it in config.
  • Version your vector collections by embedding model and rebuild on any change.
  • Add startup checks that compare expected dimension vs stored collection dimension before serving traffic.

If you’re building CrewAI agents for regulated workflows like claims search or policy QA, treat embeddings like schema. Once they drift, retrieval breaks in ways that look random but are completely deterministic.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides