How to Fix 'embedding dimension mismatch' in LangChain (Python)
What the error means
embedding dimension mismatch means the vector you generated for a query or document has a different length than the vectors already stored in your vector database. In LangChain, this usually shows up when you change embedding models, mix providers, or reuse an existing index created with a different embedding dimension.
The failure often appears during add_texts(), similarity_search(), or when building a retriever over an existing store like Chroma, FAISS, Pinecone, or Qdrant.
The Most Common Cause
The #1 cause is simple: you created the vector store with one embedding model, then queried it with another.
For example, OpenAI text-embedding-3-small returns 1536 dimensions by default, while text-embedding-3-large returns 3072 unless you explicitly set dimensions. If your index was built with one and queried with the other, LangChain will pass the mismatch down to the backend and you’ll get errors like:
- •
ValueError: Embedding dimension mismatch - •
InvalidDimensionException - •
Vector dimension 1536 does not match collection dimension 3072
Broken vs fixed pattern
| Broken code | Fixed code |
|---|---|
| ```python | |
| from langchain_openai import OpenAIEmbeddings | |
| from langchain_chroma import Chroma |
Built earlier with text-embedding-3-small (1536)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
db = Chroma( collection_name="docs", persist_directory="./chroma_db", embedding_function=embeddings, )
Fails because stored vectors are 1536-dim, query vectors are 3072-dim
results = db.similarity_search("What is AML?")
|python
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
Use the same model that created the index
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = Chroma( collection_name="docs", persist_directory="./chroma_db", embedding_function=embeddings, )
results = db.similarity_search("What is AML?")
If you intentionally want to switch models, rebuild the index from scratch. Don’t reuse old persisted vectors.
```python
# Rebuild after changing embedding model
db = Chroma.from_texts(
texts,
embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
collection_name="docs_v2",
persist_directory="./chroma_db_v2",
)
Other Possible Causes
1) You changed providers without rebuilding the index
This happens when your ingestion pipeline used one provider and your app runtime uses another.
# Ingestion
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-small-en-v1.5")
# 384 dims
# Runtime
from langchain_openai import OpenAIEmbeddings
query_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
# 3072 dims
Fix: keep ingestion and query embeddings identical.
2) Your persisted vector store is stale
You upgraded code, but reused an old local DB or remote collection.
from langchain_chroma import Chroma
db = Chroma(
collection_name="customer_docs",
persist_directory="./chroma_db", # old vectors still here
embedding_function=embeddings,
)
Fix: delete and recreate the store if the embedding model changed.
rm -rf ./chroma_db
3) You passed raw embeddings from two different sources into one store
This shows up when you manually call embed_documents() from one model and embed_query() from another.
doc_vecs = embedder_a.embed_documents(["policy text"])
query_vec = embedder_b.embed_query("policy question")
Fix: use one embedding object for both paths.
4) Your backend has a fixed index dimension
Some stores enforce dimension at collection creation time. Qdrant and Pinecone are common examples.
from qdrant_client import QdrantClient, models
client.create_collection(
collection_name="docs",
vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE),
)
If your embeddings return 3072 later, inserts will fail. Fix the collection size or recreate it with the correct dimension.
How to Debug It
- •
Print the actual embedding length
vec = embeddings.embed_query("test") print(len(vec))Compare that number to what your vector DB expects.
- •
Check what model built the index Look at ingestion logs, deployment history, or seed scripts. If you see
bge-small,all-MiniLM, ortext-embedding-3-small, note the dimension they produce. - •
Inspect the vector store schema For Qdrant/Pinecone/Chroma metadata, confirm collection size or stored vector shape. Typical mismatch errors look like:
- •
ValueError: expected dim 1536, got 3072 - •
Collection dimensionality mismatch - •
Vector size must be equal to ...
- •
- •
Verify ingestion and query use the same class Make sure both sides use the same LangChain embedding wrapper:
- •
OpenAIEmbeddings - •
HuggingFaceBgeEmbeddings - •
AzureOpenAIEmbeddings - •
CohereEmbeddings
- •
Prevention
- •
Keep embedding config in one place.
- •Same model name
- •Same provider
- •Same normalization settings if applicable
- •
Version your vector stores by embedding model.
- •Example:
docs_v1_1536,docs_v2_3072 - •Rebuild on any embedding change
- •Example:
- •
Add a startup check.
expected_dim = len(embeddings.embed_query("ping")) assert expected_dim == INDEX_DIMENSION
If you treat embeddings like schema — because they are — this error stops being mysterious. It becomes a normal migration issue: change schema, rebuild data, move on.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit