How to Fix 'embedding dimension mismatch when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-22
embedding-dimension-mismatch-when-scalingautogenpython

Opening

embedding dimension mismatch when scaling usually means your vector store has embeddings of one size, but the code trying to insert or query them is producing a different size. In AutoGen, this shows up when you wire an agent, memory, or retrieval layer to an embedding model that does not match the index already on disk.

You typically hit it after switching models, reusing an old Chroma/FAISS index, or mixing embeddings from different providers in the same collection. The failure often surfaces deep inside retrieval code, so the stack trace looks like AutoGen broke it when the real issue is your embedding pipeline.

The Most Common Cause

The #1 cause is reusing a persisted vector store built with one embedding model and then querying it with another model that outputs a different vector length.

For example, you may have built the index with text-embedding-ada-002 style 1536-dimension vectors, then later switched to a 3072-dimension model like text-embedding-3-large. AutoGen’s RetrieveUserProxyAgent, AssistantAgent, or memory-backed retrieval will eventually hit something like:

  • ValueError: Embedding dimension mismatch
  • InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 3072
  • ValueError: operands could not be broadcast together

Broken vs fixed pattern

BrokenFixed
Reuse old collection with new embedding modelRebuild collection with the same embedding model
Mix providers across runsPin one embedding config per collection
Let AutoGen infer embeddings implicitlyPass explicit embedding function/config
# BROKEN
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from chromadb import PersistentClient
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

# This collection was created earlier with a different embedding model.
client = PersistentClient(path="./chroma_db")
collection = client.get_collection("support_docs")

embedding_fn = OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-large",  # 3072 dims
)

assistant = RetrieveAssistantAgent(
    name="support_bot",
    llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    retrieve_config={
        "vector_db": collection,
        "embedding_function": embedding_fn,
    },
)
# FIXED
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from chromadb import PersistentClient
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

client = PersistentClient(path="./chroma_db")

embedding_fn = OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small",  # keep this stable for this collection
)

# Recreate the collection if it was built with another dimension.
try:
    client.delete_collection("support_docs")
except Exception:
    pass

collection = client.get_or_create_collection(
    name="support_docs",
    embedding_function=embedding_fn,
)

assistant = RetrieveAssistantAgent(
    name="support_bot",
    llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    retrieve_config={
        "vector_db": collection,
        "embedding_function": embedding_fn,
    },
)

If you changed models intentionally, do not try to “scale” the old vectors. Rebuild the index from source documents using the new embedding function.

Other Possible Causes

1) You are mixing collections and models across environments

A dev machine may use one .env, while staging uses another. If the persisted DB path is shared, you can end up loading an index built with a different provider.

# config drift example
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-3-small")
DB_PATH = os.getenv("CHROMA_PATH", "./chroma_db")

If CHROMA_PATH points to old data, delete or migrate that store before querying it with a new model.

2) Your chunking pipeline changed and rebuilt part of the index incorrectly

Sometimes only some documents were reindexed after a chunking change. That gives you a mixed-dimensional store if your ingestion code switched models mid-run.

# bad: first batch used one embedder, second batch used another
if batch_id < 10:
    embedder = old_embedder
else:
    embedder = new_embedder

Keep ingestion deterministic: one corpus, one embedder version.

3) FAISS index dimensionality does not match your current embeddings

FAISS indexes are strict about vector size. If you created an IndexFlatL2(1536) and now feed it 3072-dim vectors, insertion fails immediately.

import faiss

dim = 1536
index = faiss.IndexFlatL2(dim)

# later...
vectors = np.array(new_embeddings)  # shape: (n, 3072)
index.add(vectors)  # boom: dimension mismatch

Recreate the FAISS index with the correct dimension derived from the current embedder.

4) You are using cached embeddings from disk or Redis

AutoGen setups often cache document embeddings for speed. If that cache outlives an embedding model change, you’ll query stale vectors.

cache_key = f"embeddings:{doc_id}"
# old cache contains 1536-dim arrays
# new code expects 3072-dim arrays

Version your cache keys by model name and dimension.

How to Debug It

  1. Print the actual embedding shape before insert/query

    emb = embedder.embed_query("test")
    print(len(emb))
    

    If this number differs from your stored index dimension, that is the bug.

  2. Inspect the vector store schema

    • Chroma: check collection metadata and recreate if needed.
    • FAISS: inspect index.d.
    • SQLite-backed stores: look for stored dimension columns.
  3. Search your logs for mixed model names Look for both old and new models in ingestion logs:

    • text-embedding-ada-002
    • text-embedding-3-small
    • text-embedding-3-large
  4. Force a clean rebuild Delete the persisted DB directory and re-ingest once. If the error disappears after rebuild, you had stale vectors or mixed dimensions.

Prevention

  • Pin one embedding model per vector store and encode that choice in config.
  • Version persisted indexes by {provider}-{model}-{dimension} so stale data cannot be reused accidentally.
  • Add a startup assertion that checks len(embed_query("ping")) == index_dimension before serving traffic.

If you are building AutoGen retrieval agents in production, treat embedding dimensions like database schema. Change them deliberately, migrate them explicitly, and never assume an old index can accept new vectors.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides