How to Fix 'embedding dimension mismatch when scaling' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-22

embedding-dimension-mismatch-when-scalingcrewaitypescript

If you’re seeing embedding dimension mismatch when scaling in CrewAI TypeScript, it usually means your vector store was built with one embedding model and later queried or upserted with another. The symptom shows up when CrewAI tries to retrieve memories, tools, or knowledge from a store whose vector size no longer matches the current embedding output.

This is almost always a data consistency problem, not an agent logic problem. In practice, it happens after switching models, changing providers, or reusing an old persisted index.

The Most Common Cause

The #1 cause is mixing embedding models across runs. For example, you indexed documents with OpenAI text-embedding-3-small and later switched your agent to text-embedding-3-large, or moved from OpenAI embeddings to local nomic-embed-text.

Here’s the broken pattern:

Broken	Fixed
Persisted index created with one embedding model	Rebuild the index or keep the same embedding model
Querying with a different embedding dimension	Use the exact same embedding config for both ingest and query

// BROKEN: index built with one embedding model, queried with another

import { Agent } from "crewai";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Chroma } from "@langchain/community/vectorstores/chroma";

const ingestEmbeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small", // 1536 dims
});

const queryEmbeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-large", // different dimensions
});

const vectorStore = await Chroma.fromDocuments(docs, ingestEmbeddings, {
  collectionName: "company_knowledge",
});

const agent = new Agent({
  role: "Support Analyst",
  goal: "Answer questions from knowledge base",
  backstory: "Uses CrewAI memory and retrieval",
});

// Later...
const results = await vectorStore.similaritySearch("refund policy", 5);
// Runtime error from underlying vector DB:
// Error: embedding dimension mismatch when scaling

// FIXED: same embedding model used for ingest and query

import { OpenAIEmbeddings } from "@langchain/openai";
import { Chroma } from "@langchain/community/vectorstores/chroma";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = await Chroma.fromDocuments(docs, embeddings, {
  collectionName: "company_knowledge",
});

// Reuse the same embeddings instance/config everywhere
const results = await vectorStore.similaritySearch("refund policy", 5);

If you changed models intentionally, delete and rebuild the collection. A persisted store does not auto-migrate dimensions.

Other Possible Causes

1) Mixing providers in the same pipeline

You might embed documents with OpenAI and queries with Azure OpenAI, Cohere, or a local Ollama model. Even if the text looks compatible, the output vectors are not guaranteed to have the same size.

// Bad: different providers in one collection
const docEmbeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
const queryEmbeddings = new CohereEmbeddings({ model: "embed-english-v3.0" });

Keep ingestion and retrieval on the same provider/model pair unless you explicitly verified identical dimensions.

2) Reusing an old persisted vector DB after changing models

This is common with Chroma, Pinecone, Weaviate, or Qdrant collections left on disk or in a shared environment. The index still contains vectors of the old size.

// Old collection already exists on disk/in cloud
await Chroma.fromDocuments(newDocs, newEmbeddings, {
  collectionName: "crew_memory",
});

Fix by dropping the collection or creating a versioned namespace:

await Chroma.fromDocuments(newDocs, embeddings, {
  collectionName: `crew_memory_v2`,
});

3) Custom embedder returns inconsistent dimensions

If you wrapped a local model or custom HTTP endpoint, make sure it always returns the same-length array. A bug in batching or fallback logic can return mixed sizes.

// Bad custom embedder example
async function embed(text: string): Promise<number[]> {
  if (text.length > 1000) return largeModelEmbed(text); // maybe 1024 dims
  return smallModelEmbed(text); // maybe 768 dims
}

Every call must return identical dimensionality for that index.

4) Wrong collection reused across environments

A dev container may point at production storage by accident. Then your local code inserts vectors into a collection that already contains incompatible embeddings.

# Dangerous if shared across environments
CHROMA_PERSIST_DIR=/data/vectorstore
COLLECTION_NAME=crew_memory

Use environment-specific namespaces:

COLLECTION_NAME=crew_memory_dev_2026_04_22

How to Debug It

•
Print the embedding length before any upsert
- •If your embedder returns 1536 but the store expects 3072, you found it.
- •Log both document embeddings and query embeddings.

const vec = await embeddings.embedQuery("refund policy");
console.log("embedding length:", vec.length);

•
Check what model was used to build the existing index
- •Look at seed scripts, migrations, CI jobs, and old deploys.
- •Search for text-embedding-3-small, text-embedding-3-large, nomic-embed-text, or any custom provider.
•
Delete and rebuild one test collection
- •If rebuilding fixes it immediately, your issue is stale persisted data.
- •Don’t guess; isolate by using a fresh collection name.
•
Verify every CrewAI component uses one embedding config
- •Agents may share memory tools, knowledge stores, or retrievers.
- •Make sure all of them point to the same embedder instance or identical configuration.

If you’re using LangChain wrappers inside CrewAI TypeScript, inspect where your VectorStore is created versus where retrieval happens. The mismatch often comes from two separate files instantiating different embedding configs.

Prevention

•Use one embeddings.ts module and import it everywhere.
•
Version your collections when changing models:
- •crew_memory_v1
- •crew_memory_v2
•
Add a startup check that logs:
- •provider name
- •model name
- •expected dimension if your SDK exposes it

A simple rule works here: if you change embedding model name, assume every persisted vector index is invalid until rebuilt. That prevents this class of CrewAI errors before they hit production.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit