How to Fix 'embedding dimension mismatch during development' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-22
embedding-dimension-mismatch-during-developmentcrewaitypescript

Embedding dimension mismatch in CrewAI usually means your vector store was built with one embedding model, then queried with another model that returns a different vector size. In TypeScript projects, this often shows up during development when you swap models, change providers, or reuse an old persisted index.

The failure is not in CrewAI itself so much as in the contract between your embedding model and your store. If the stored vectors are 1536 dimensions and your query embeddings are 3072, you’ll get errors like embedding dimension mismatch, expected 1536 dimensions but got 3072, or a vector database-specific variant of the same thing.

The Most Common Cause

The #1 cause is mixing embedding models between indexing and querying.

A common pattern is: you ingest documents with one embedder, then later run your CrewAI agent with a different embedder configured somewhere else in the app. The vector DB still contains old embeddings, so retrieval fails when RAGTool, VectorStoreTool, or your custom search layer tries to compare incompatible vectors.

Broken vs fixed

Broken patternFixed pattern
Index with text-embedding-3-small, query with text-embedding-3-largeUse the same embedding model for both indexing and querying
Persist old vectors, then switch model without reindexingRebuild the index after changing embedding dimensions
// BROKEN: document ingestion and query use different embedding models

import { OpenAI } from "@langchain/openai";
import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";

const ingestEmbeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small", // 1536 dims
});

const queryEmbeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-large", // 3072 dims
});

const vectorStore = await Chroma.fromTexts(
  ["policy renewal rules"],
  [{ id: "doc1" }],
  ingestEmbeddings,
  { collectionName: "knowledge_base" }
);

// Later in the app
const results = await vectorStore.similaritySearch("renewal policy", 4, {
  embeddings: queryEmbeddings,
});
// FIXED: one embedding config used everywhere

import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = await Chroma.fromTexts(
  ["policy renewal rules"],
  [{ id: "doc1" }],
  embeddings,
  { collectionName: "knowledge_base" }
);

// Query uses the same embeddings instance/model
const results = await vectorStore.similaritySearch("renewal policy", 4);

If you changed models recently, delete the old collection and re-ingest. Same code, same provider, same model name, same dimensions.

Other Possible Causes

1) Reusing a persisted index after changing models

If you persist a local Chroma/Pinecone/Qdrant collection and later change the embedding model, old vectors remain on disk or in the remote index.

// config changed from small to large, but existing collection stays unchanged
const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-large" });

await Chroma.fromExistingCollection(embeddings, {
  collectionName: "knowledge_base",
});

Fix:

// delete/rebuild the collection before re-ingesting
await client.deleteCollection({ name: "knowledge_base" });
// then ingest again with the new embeddings model

2) Mixing providers across environments

Dev might use OpenAI embeddings while staging uses Azure OpenAI or Cohere. The API call succeeds, but returned dimensions differ.

// dev
new OpenAIEmbeddings({ model: "text-embedding-3-small" });

// staging
new AzureOpenAIEmbeddings({ deployment: "embed-v2" });

Make provider and model explicit per environment:

const embeddings =
  process.env.EMBEDDING_PROVIDER === "openai"
    ? new OpenAIEmbeddings({ model: "text-embedding-3-small" })
    : new AzureOpenAIEmbeddings({ deployment: "embed-v2" });

3) Hardcoding chunked data into one store and querying another

This happens when your ingestion script writes to one namespace/collection and your agent queries a different one. The error can look like a dimension issue because the wrong store is being hit.

// ingest writes to prod namespace
namespace: "prod"

// agent queries dev namespace by mistake
namespace: "dev"

Check that your collectionName, namespace, and tenant/project IDs match exactly.

4) One tool uses its own internal embedder

Some CrewAI tools wrap their own retrieval layer. Your agent may be configured correctly, but the tool creates a separate default embedder under the hood.

// tool internally defaults to its own embeddings config
const ragTool = new RAGTool({
  knowledgeBasePath: "./kb",
});

Fix by injecting the exact same embedder into the tool if supported:

const ragTool = new RAGTool({
  knowledgeBasePath: "./kb",
  embeddings,
});

How to Debug It

  1. Print the embedding dimension at runtime

    • Log both ingestion and query-time models.
    • For most SDKs, inspect a sample vector length.
    const vec = await embeddings.embedQuery("test");
    console.log("embedding length:", vec.length);
    
  2. Check what is already stored

    • Look at your vector DB schema or metadata.
    • Confirm whether existing vectors were built with a different model/version.
  3. Trace every place embeddings are created

    • Search for new OpenAIEmbeddings, new AzureOpenAIEmbeddings, new CohereEmbeddings.
    • In TypeScript codebases, this often appears in both app startup and tool initialization.
  4. Rebuild from scratch

    • Delete the collection/index.
    • Re-ingest with one known-good embedding config.
    • If retrieval works after rebuild, you found a stale-index problem.

Prevention

  • Centralize embedding config in one module and import it everywhere.
  • Version your indexes by embedding model name, for example kb_text-embedding-3-small_v1.
  • Add a startup assertion that checks stored dimension vs current dimension before agents run.
  • When changing embedding models, treat it like a schema migration: reindex first, deploy second.

The practical fix is simple: one embedder per index lifecycle. If CrewAI says embedding dimension mismatch during development, assume stale vectors or split configuration until proven otherwise.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides