AutoGen Tutorial (TypeScript): caching embeddings for advanced developers
This tutorial shows you how to cache embeddings in a TypeScript AutoGen setup so repeated semantic lookups stop burning tokens and latency. You need this when your agent repeatedly embeds the same documents, prompts, or retrieval chunks across runs and you want deterministic reuse instead of recomputing every time.
What You'll Need
- •Node.js 18+
- •TypeScript 5+
- •
npmorpnpm - •An OpenAI API key
- •AutoGen packages:
- •
@autogen/core - •
@autogen/openai
- •
- •A place to persist cache data:
- •local filesystem for development
- •Redis, SQLite, or Postgres for production
- •Basic familiarity with AutoGen agents, models, and async TypeScript
Step-by-Step
- •Install the packages and set up your environment.
The important part here is that the embedding model and cache live in the same process boundary so repeated calls can be intercepted before they hit the API.
npm init -y
npm install @autogen/core @autogen/openai dotenv
npm install -D typescript tsx @types/node
- •Create a small cache wrapper for embeddings.
This example uses a file-backed JSON cache so you can run it locally without standing up infrastructure. In production, swap the storage layer for Redis or a database, but keep the same keying strategy.
import { readFileSync, writeFileSync, existsSync } from "node:fs";
import { createHash } from "node:crypto";
type EmbeddingCache = Record<string, number[]>;
const CACHE_FILE = "./embedding-cache.json";
function loadCache(): EmbeddingCache {
if (!existsSync(CACHE_FILE)) return {};
return JSON.parse(readFileSync(CACHE_FILE, "utf8")) as EmbeddingCache;
}
function saveCache(cache: EmbeddingCache) {
writeFileSync(CACHE_FILE, JSON.stringify(cache, null, 2));
}
function cacheKey(model: string, input: string) {
return createHash("sha256").update(`${model}:${input}`).digest("hex");
}
- •Wire AutoGen to an OpenAI embedding model and check the cache before calling the model.
The key point is to hash both the model name and the exact text input. If either changes, you want a miss; otherwise you want a stable hit.
import "dotenv/config";
import { OpenAIClient } from "@autogen/openai";
const client = new OpenAIClient({
apiKey: process.env.OPENAI_API_KEY!,
});
const EMBEDDING_MODEL = "text-embedding-3-small";
const cache = loadCache();
export async function getEmbedding(text: string): Promise<number[]> {
const key = cacheKey(EMBEDDING_MODEL, text);
if (cache[key]) {
console.log("cache hit:", key);
return cache[key];
}
console.log("cache miss:", key);
const result = await client.embeddings.create({
model: EMBEDDING_MODEL,
input: text,
});
const vector = result.data[0].embedding;
cache[key] = vector;
saveCache(cache);
return vector;
}
- •Use cached embeddings inside an agent workflow.
This is where caching starts paying off: document retrieval, classification pre-processing, deduplication checks, and any tool that repeatedly embeds stable text. Keep the embedding call outside your agent prompt path so it stays deterministic and testable.
async function main() {
const docs = [
"Claims escalation policy requires human review after three failed attempts.",
"Claims escalation policy requires human review after three failed attempts.",
"Fraud signals include mismatched address history and device fingerprint drift.",
];
const vectors = [];
for (const doc of docs) {
vectors.push(await getEmbedding(doc));
}
console.log("embeddings computed:", vectors.length);
console.log("first vector dimensions:", vectors[0].length);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
- •Add a simple similarity check so you can confirm cached vectors are still useful downstream.
This is not required for caching itself, but it proves the output is being reused correctly in an actual retrieval flow.
function cosineSimilarity(a: number[], b: number[]) {
let dot = 0;
let magA = 0;
let magB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
magA += a[i] * a[i];
magB += b[i] * b[i];
}
return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}
async function compareDocs() {
const x = await getEmbedding("fraud signals include device fingerprint drift");
const y = await getEmbedding("fraud indicators include device fingerprint drift");
console.log("similarity:", cosineSimilarity(x, y).toFixed(4));
}
compareDocs();
Testing It
Run the script twice with the same inputs. The first run should print cache miss for each unique string; the second run should print cache hit for those same strings and complete faster.
Check that embedding-cache.json is created and populated with SHA-256 keys mapped to numeric arrays. If you change even one character in the input text or switch models, you should see a miss again.
For a real integration test, point this at a list of stable policy paragraphs or knowledge-base chunks and confirm token usage drops across repeated runs. That’s the behavior you want before moving this into an agent pipeline.
Next Steps
- •Replace the JSON file with Redis so multiple workers share one embedding cache.
- •Add TTLs and versioned keys so model upgrades don’t poison old embeddings.
- •Put this behind a retrieval tool in AutoGen so agents never call embeddings directly from prompt logic.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit