How to Fix 'embedding dimension mismatch when scaling' in LlamaIndex (Python)
If you see ValueError: embedding dimension mismatch when scaling, it usually means LlamaIndex is trying to compare or store vectors that were created with different embedding models. This shows up most often when you switch models, reuse an old vector index, or mix embeddings from different providers in the same storage.
The key detail: your index, vector store, and query-time embedder must all agree on vector size. If one side produces 1536-dimension vectors and the other expects 3072, LlamaIndex will fail during insert, retrieval, or scaling operations.
The Most Common Cause
The #1 cause is reusing a persisted index after changing the embedding model.
A common pattern is:
- •build the index with one model
- •later load the same persisted storage
- •query or insert with a different model
That gives you a mismatch between stored vectors and newly generated vectors.
| Broken pattern | Fixed pattern |
|---|---|
| Build with one embedder, load with another | Use the same embedder for build + query, or rebuild the index |
| Persist old vectors and change models later | Clear storage before switching embedding dimensions |
# BROKEN: index was built with text-embedding-3-small (1536 dims)
# then queried later with text-embedding-3-large (3072 dims)
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.embeddings.openai import OpenAIEmbedding
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# Query-time embedder changed
embed_model = OpenAIEmbedding(model="text-embedding-3-large")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(embed_model=embed_model)
response = query_engine.query("What is our claims process?")
# FIXED: use the same embedding model that was used to build the index
# or rebuild the index if you want to switch models
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(embed_model=embed_model)
response = query_engine.query("What is our claims process?")
If you want to change models, do not keep the old persisted data around. Delete the storage directory and rebuild:
import shutil
shutil.rmtree("./storage", ignore_errors=True)
Other Possible Causes
1) Mixing embedding providers in one pipeline
This happens when ingestion uses one provider and querying uses another.
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
ingest_embed_model = OpenAIEmbedding(model="text-embedding-3-small")
query_embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
These may produce different dimensions. Keep ingestion and query on the same embedding family unless you know exactly what your vector store supports.
2) Vector store schema fixed to an older dimension
Some stores enforce a fixed vector size at collection creation time. If your first insert created a 1536-dim collection, later inserts of 3072-dim vectors will fail.
# Example: existing collection already created for 1536-dim vectors
vector_store_config = {
"collection_name": "support_docs"
}
Fix:
- •drop and recreate the collection
- •or create a new collection name per embedding model
3) Cached nodes built from stale embeddings
If you persist nodes or chunks and later re-run only part of the pipeline, some nodes may still carry old embeddings.
# Stale cache example
index.storage_context.persist(persist_dir="./storage")
# later: documents changed but cached embeddings were reused
Fix:
- •clear cached embeddings
- •re-run ingestion end-to-end after changing chunking or model settings
4) Chunking changes without rebuilding embeddings
Chunk size changes do not directly change embedding dimension, but they often trigger partial rebuilds where old and new nodes coexist in storage.
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=512) # changed from previous run
If you changed chunking logic, rebuild the entire index so every node has embeddings from the same pipeline run.
How to Debug It
- •
Print the embedding model name at ingest and query time
- •Verify both sides use the same class and model.
- •Check for
OpenAIEmbedding,HuggingFaceEmbedding,OllamaEmbedding, etc.
- •
Inspect vector dimensions
- •Log a sample embedding length from both pipelines.
- •Example:
emb = embed_model.get_text_embedding("test") print(len(emb)) - •If lengths differ, that is your root cause.
- •
Check persisted storage
- •If you are using
StorageContext.from_defaults(persist_dir=...), assume old vectors may be present. - •Delete the directory and rebuild if you recently changed models.
- •If you are using
- •
Verify vector store constraints
- •For Pinecone, Qdrant, Weaviate, Chroma, or pgvector-backed setups, confirm collection/index dimension matches your current embedder.
- •If needed, create a fresh namespace or collection.
Prevention
- •
Keep embedding config in one place:
- •model name
- •provider class
- •normalization settings
- •persist directory / collection name
- •
Version your indexes by embedding model:
- •
support_docs_openai_1536 - •
support_docs_bge_384 - •never reuse a collection across incompatible dimensions
- •
- •
Rebuild after any embedding-related change:
- •model swap
- •provider swap
- •vector DB migration
- •major chunking pipeline changes
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit