CrewAI Tutorial (TypeScript): caching embeddings for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewaicaching-embeddings-for-intermediate-developerstypescript

This tutorial shows you how to cache embeddings in a CrewAI TypeScript workflow so repeated document retrieval does not keep paying the embedding cost. You need this when your agents repeatedly search the same knowledge base, especially in support, compliance, or claims workflows where latency and API spend matter.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • CrewAI TypeScript packages installed
  • An OpenAI API key
  • A local Redis instance for caching embeddings
  • A .env file with your secrets

Install the dependencies:

npm install @crewai/crewai @crewai/knowledge @crewai/memory openai redis dotenv
npm install -D typescript ts-node @types/node

Set your environment variables:

OPENAI_API_KEY=your_openai_key
REDIS_URL=redis://localhost:6379

Step-by-Step

  1. Start by wiring Redis as a persistent cache for embedding vectors. The pattern is simple: generate an embedding once, store it under a stable key, and reuse it on later runs.
import 'dotenv/config';
import { createClient } from 'redis';
import OpenAI from 'openai';

const redis = createClient({ url: process.env.REDIS_URL });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

await redis.connect();

export async function getCachedEmbedding(text: string): Promise<number[]> {
  const key = `embedding:${Buffer.from(text).toString('base64')}`;
  const cached = await redis.get(key);

  if (cached) return JSON.parse(cached);

  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });

  const embedding = response.data[0].embedding;
  await redis.set(key, JSON.stringify(embedding), { EX: 60 * 60 * 24 });
  return embedding;
}
  1. Next, wrap that cache in a small retriever utility. In production you want this layer isolated so the rest of your agent code never cares whether vectors came from Redis or the embedding API.
type Chunk = {
  id: string;
  text: string;
};

const docs: Chunk[] = [
  { id: 'policy-1', text: 'Claims must be reviewed within 48 hours.' },
  { id: 'policy-2', text: 'Escalate fraud indicators to the compliance queue.' },
];

function cosineSimilarity(a: number[], b: number[]) {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}
  1. Build the actual retrieval function using the cached embeddings. This is where you save money: document embeddings are computed once, then reused across every query.
const docEmbeddings = new Map<string, number[]>();

async function embedDoc(doc: Chunk) {
  const existing = docEmbeddings.get(doc.id);
  if (existing) return existing;

  const embedding = await getCachedEmbedding(doc.text);
  docEmbeddings.set(doc.id, embedding);
  return embedding;
}

export async function retrieve(query: string) {
  const queryEmbedding = await getCachedEmbedding(query);

  const scored = await Promise.all(
    docs.map(async (doc) => ({
      doc,
      score: cosineSimilarity(queryEmbedding, await embedDoc(doc)),
    }))
  );

  return scored.sort((a, b) => b.score - a.score).slice(0, 2);
}
  1. Now plug retrieval into a CrewAI agent task. The agent gets only the top matches, which keeps prompts smaller and makes the system easier to control.
import { Agent, Task, Crew } from '@crewai/crewai';

const analyst = new Agent({
  name: 'Policy Analyst',
  role: 'Answer policy questions from retrieved context',
  goal: 'Return concise answers grounded in policy text',
});

export async function answerQuestion(question: string) {
  const matches = await retrieve(question);
  const context = matches.map((m) => `(${m.doc.id}) ${m.doc.text}`).join('\n');

  const task = new Task({
    description: `Answer this question using only the provided context:\n${question}\n\nContext:\n${context}`,
    expectedOutput: 'A short answer with citations',
    agent: analyst,
  });

  const crew = new Crew({ agents: [analyst], tasks: [task] });
  return crew.kickoff();
}
  1. Add a small runner so you can verify cache hits locally. Run the same question twice and watch Redis eliminate repeated embedding calls for both the query and document chunks.
async function main() {
  const first = await answerQuestion('How long do claims have before review?');
  
	const second = await answerQuestion('How long do claims have before review?');

	console.log('First run:', first);
	console.log('Second run:', second);

	await redis.disconnect();
}

main().catch(async (err) => {
	console.error(err);
	await redis.disconnect();
	process.exit(1);
});

Testing It

Run your script once and confirm Redis gets keys like embedding:*. Then run it again with the same inputs and check that no new embedding requests are made to OpenAI.

If you want proof at the code level, add logging inside getCachedEmbedding() for cache hits and misses. On the second run you should see mostly hits for both document chunks and identical queries.

For stronger validation, delete one Redis key and rerun just that path. You should see exactly one fresh embedding call and everything else served from cache.

Next Steps

  • Add TTL policies by document type so stale policies expire faster than stable reference content.
  • Move from in-memory Map storage to fully persistent vector storage if your corpus grows.
  • Add batch precomputation for all known documents during deployment instead of waiting for first query traffic.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides