CrewAI Tutorial (TypeScript): caching embeddings for advanced developers
This tutorial shows how to add a real embedding cache to a CrewAI TypeScript workflow so repeated semantic lookups stop burning tokens and latency. You need this when your agents keep re-embedding the same documents, prompts, or knowledge chunks across runs and you want deterministic performance in production.
What You'll Need
- •Node.js 18+ and npm
- •A TypeScript project with
ts-nodeortsx - •CrewAI TypeScript package installed
- •An embeddings provider API key:
- •OpenAI:
OPENAI_API_KEY - •Or another provider supported by your stack
- •OpenAI:
- •A cache store:
- •Redis for production
- •Or an in-memory Map for local testing
- •Basic familiarity with:
- •CrewAI agents/tasks
- •Embeddings and vector similarity
- •Async TypeScript
Step-by-Step
- •Install the dependencies you need for CrewAI, embeddings, and cache access. I’m using Redis here because it’s the right default once you move past local experiments.
npm install @crewai/crewai openai redis dotenv
npm install -D typescript tsx @types/node
- •Create a small cache wrapper that stores embedding vectors by a stable hash of the input text. The important part is normalization before hashing, otherwise trivial whitespace changes will produce cache misses.
import crypto from "crypto";
export function normalizeText(text: string): string {
return text.trim().replace(/\s+/g, " ").toLowerCase();
}
export function embeddingCacheKey(model: string, text: string): string {
const normalized = normalizeText(text);
const hash = crypto.createHash("sha256").update(normalized).digest("hex");
return `emb:${model}:${hash}`;
}
- •Add an embedding service that checks Redis first, then falls back to the provider, then writes the result back into cache. This keeps your agent code clean and makes caching reusable across multiple crews.
import { createClient } from "redis";
import OpenAI from "openai";
import { embeddingCacheKey } from "./cache-key";
const redis = createClient({ url: process.env.REDIS_URL });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
await redis.connect();
export async function getCachedEmbedding(text: string) {
const model = "text-embedding-3-small";
const key = embeddingCacheKey(model, text);
const cached = await redis.get(key);
if (cached) return JSON.parse(cached) as number[];
const response = await openai.embeddings.create({
model,
input: text,
});
const vector = response.data[0].embedding;
await redis.set(key, JSON.stringify(vector), { EX: 60 * 60 * 24 * 7 });
return vector;
}
- •Wire the cached embeddings into a CrewAI task flow. The pattern below uses the cached vector for retrieval-style preprocessing before the agent answers, which is where repeated embedding calls usually pile up.
import { Agent, Task, Crew } from "@crewai/crewai";
import { getCachedEmbedding } from "./embedding-cache";
const analyst = new Agent({
role: "Risk Analyst",
goal: "Answer questions using cached semantic context",
backstory: "You work on banking workflows where latency matters.",
});
const task = new Task({
description: "Explain whether this policy clause increases operational risk.",
expectedOutput: "A concise risk assessment with reasoning.",
});
async function run() {
const query = "Does this clause create settlement timing exposure?";
const vector = await getCachedEmbedding(query);
const crew = new Crew({
agents: [analyst],
tasks: [task],
verbose: true,
});
console.log("Embedding length:", vector.length);
const result = await crew.kickoff();
console.log(result);
}
run();
- •If you want stronger reuse across document-heavy workflows, cache embeddings for each chunk during ingestion instead of only at query time. That gives you stable retrieval performance when the same policy pages or claims notes are processed repeatedly.
import { getCachedEmbedding } from "./embedding-cache";
const chunks = [
"Policy exclusions include intentional loss and fraud.",
"Claims must be reported within thirty days of discovery.",
];
async function indexChunks() {
for (const chunk of chunks) {
const embedding = await getCachedEmbedding(chunk);
console.log(chunk.slice(0, 40), embedding.slice(0, 3));
// Persist chunk + embedding in your vector store here.
// Example target stores: pgvector, Pinecone, Weaviate.
}
}
indexChunks();
- •For local development without Redis, use an in-memory Map so you can validate behavior quickly. Keep this only for tests; it resets on process restart and won’t help across worker processes.
const memoryCache = new Map<string, number[]>();
export async function getCachedEmbeddingLocal(key: string, loader: () => Promise<number[]>) {
const existing = memoryCache.get(key);
if (existing) return existing;
const value = await loader();
memoryCache.set(key, value);
return value;
}
Testing It
Run the same input twice and confirm the second call skips the provider request. With Redis enabled, you should see a fast hit on the second run and no new embedding API usage for that exact normalized text.
Check that whitespace-only changes still map to the same cache key because of normalization. Then change one meaningful word and verify it becomes a miss.
If you’re using logs or metrics, count cache hits and misses per model name. In production, that ratio tells you whether your ingestion pipeline or query layer is actually benefiting from caching.
Next Steps
- •Add TTL policies by document class so stale legal or policy content expires correctly.
- •Store embeddings in pgvector or another vector DB alongside the cache key for durable retrieval.
- •Wrap cache metrics with OpenTelemetry so you can trace embedding latency inside CrewAI runs.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit