How to Fix 'embedding dimension mismatch' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-22
embedding-dimension-mismatchlangchainpython

What the error means

embedding dimension mismatch means the vector you generated for a query or document has a different length than the vectors already stored in your vector database. In LangChain, this usually shows up when you change embedding models, mix providers, or reuse an existing index created with a different embedding dimension.

The failure often appears during add_texts(), similarity_search(), or when building a retriever over an existing store like Chroma, FAISS, Pinecone, or Qdrant.

The Most Common Cause

The #1 cause is simple: you created the vector store with one embedding model, then queried it with another.

For example, OpenAI text-embedding-3-small returns 1536 dimensions by default, while text-embedding-3-large returns 3072 unless you explicitly set dimensions. If your index was built with one and queried with the other, LangChain will pass the mismatch down to the backend and you’ll get errors like:

  • ValueError: Embedding dimension mismatch
  • InvalidDimensionException
  • Vector dimension 1536 does not match collection dimension 3072

Broken vs fixed pattern

Broken codeFixed code
```python
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

Built earlier with text-embedding-3-small (1536)

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

db = Chroma( collection_name="docs", persist_directory="./chroma_db", embedding_function=embeddings, )

Fails because stored vectors are 1536-dim, query vectors are 3072-dim

results = db.similarity_search("What is AML?") |python from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma

Use the same model that created the index

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

db = Chroma( collection_name="docs", persist_directory="./chroma_db", embedding_function=embeddings, )

results = db.similarity_search("What is AML?")


If you intentionally want to switch models, rebuild the index from scratch. Don’t reuse old persisted vectors.

```python
# Rebuild after changing embedding model
db = Chroma.from_texts(
    texts,
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
    collection_name="docs_v2",
    persist_directory="./chroma_db_v2",
)

Other Possible Causes

1) You changed providers without rebuilding the index

This happens when your ingestion pipeline used one provider and your app runtime uses another.

# Ingestion
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-small-en-v1.5")
# 384 dims

# Runtime
from langchain_openai import OpenAIEmbeddings

query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
# 3072 dims

Fix: keep ingestion and query embeddings identical.

2) Your persisted vector store is stale

You upgraded code, but reused an old local DB or remote collection.

from langchain_chroma import Chroma

db = Chroma(
    collection_name="customer_docs",
    persist_directory="./chroma_db",  # old vectors still here
    embedding_function=embeddings,
)

Fix: delete and recreate the store if the embedding model changed.

rm -rf ./chroma_db

3) You passed raw embeddings from two different sources into one store

This shows up when you manually call embed_documents() from one model and embed_query() from another.

doc_vecs = embedder_a.embed_documents(["policy text"])
query_vec = embedder_b.embed_query("policy question")

Fix: use one embedding object for both paths.

4) Your backend has a fixed index dimension

Some stores enforce dimension at collection creation time. Qdrant and Pinecone are common examples.

from qdrant_client import QdrantClient, models

client.create_collection(
    collection_name="docs",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE),
)

If your embeddings return 3072 later, inserts will fail. Fix the collection size or recreate it with the correct dimension.

How to Debug It

  1. Print the actual embedding length

    vec = embeddings.embed_query("test")
    print(len(vec))
    

    Compare that number to what your vector DB expects.

  2. Check what model built the index Look at ingestion logs, deployment history, or seed scripts. If you see bge-small, all-MiniLM, or text-embedding-3-small, note the dimension they produce.

  3. Inspect the vector store schema For Qdrant/Pinecone/Chroma metadata, confirm collection size or stored vector shape. Typical mismatch errors look like:

    • ValueError: expected dim 1536, got 3072
    • Collection dimensionality mismatch
    • Vector size must be equal to ...
  4. Verify ingestion and query use the same class Make sure both sides use the same LangChain embedding wrapper:

    • OpenAIEmbeddings
    • HuggingFaceBgeEmbeddings
    • AzureOpenAIEmbeddings
    • CohereEmbeddings

Prevention

  • Keep embedding config in one place.

    • Same model name
    • Same provider
    • Same normalization settings if applicable
  • Version your vector stores by embedding model.

    • Example: docs_v1_1536, docs_v2_3072
    • Rebuild on any embedding change
  • Add a startup check.

    expected_dim = len(embeddings.embed_query("ping"))
    assert expected_dim == INDEX_DIMENSION
    

If you treat embeddings like schema — because they are — this error stops being mysterious. It becomes a normal migration issue: change schema, rebuild data, move on.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides