AutoGen Tutorial (TypeScript): caching embeddings for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
autogencaching-embeddings-for-intermediate-developerstypescript

This tutorial shows how to cache embeddings in a TypeScript AutoGen workflow so repeated similarity lookups stop burning tokens and latency. You need this when you chunk the same documents repeatedly, re-run retrieval across sessions, or want deterministic performance for agents that depend on vector search.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • AutoGen for TypeScript:
    • @autogen/core
    • @autogen/openai
  • An OpenAI API key set as OPENAI_API_KEY
  • A local cache store:
    • simple file cache for this tutorial
    • Redis or PostgreSQL later if you need multi-process sharing
  • A text embedding model:
    • text-embedding-3-small is enough for most internal RAG workflows

Step-by-Step

  1. Start by installing the packages and setting up a small TypeScript project. The cache will sit between your document processing pipeline and the embedding model, so we keep it explicit and easy to swap later.
npm init -y
npm install @autogen/core @autogen/openai
npm install -D typescript ts-node @types/node
  1. Create a tiny file-backed embedding cache. This version uses a SHA-256 hash of the text as the cache key, which is stable across runs and avoids storing duplicate embeddings for identical chunks.
import { createHash } from "node:crypto";
import { readFileSync, writeFileSync, existsSync } from "node:fs";

export type EmbeddingVector = number[];

const CACHE_FILE = "./embedding-cache.json";

function hashText(text: string): string {
  return createHash("sha256").update(text).digest("hex");
}

export function loadCache(): Record<string, EmbeddingVector> {
  if (!existsSync(CACHE_FILE)) return {};
  return JSON.parse(readFileSync(CACHE_FILE, "utf8"));
}

export function saveCache(cache: Record<string, EmbeddingVector>): void {
  writeFileSync(CACHE_FILE, JSON.stringify(cache, null, 2));
}

export function getCachedEmbedding(cache: Record<string, EmbeddingVector>, text: string) {
  return cache[hashText(text)];
}
  1. Wire AutoGen’s OpenAI client into a cached embedding function. The important part is that your application calls embedWithCache() instead of calling the model directly, so repeated chunks are served from disk.
import { OpenAIClient } from "@autogen/openai";
import { loadCache, saveCache } from "./cache.js";

const client = new OpenAIClient({
  apiKey: process.env.OPENAI_API_KEY!,
});

const cache = loadCache();

async function embedText(text: string): Promise<number[]> {
  const response = await client.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });

  return response.data[0].embedding;
}

export async function embedWithCache(text: string): Promise<number[]> {
  const key = text.trim();
  if (cache[key]) return cache[key];

  const embedding = await embedText(text);
  cache[key] = embedding;
  saveCache(cache);

  return embedding;
}
  1. Use the cached embeddings in a real retrieval flow. Here we embed a small set of chunks and compute cosine similarity locally, which is enough to prove the cache works before you move to a vector database.
import { embedWithCache } from "./embed-with-cache.js";

function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0;
  let magA = 0;
  let magB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }

  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

async function main() {
  const chunks = [
    "Claims must be filed within thirty days.",
    "Policy exclusions apply to pre-existing conditions.",
    "The deductible resets every calendar year.",
  ];

  const query = "When does the deductible reset?";
  const queryEmbedding = await embedWithCache(query);

  const scored = await Promise.all(
    chunks.map(async (chunk) => ({
      chunk,
      score: cosineSimilarity(queryEmbedding, await embedWithCache(chunk)),
    }))
  );

  console.log(scored.sort((a, b) => b.score - a.score)[0]);
}

main();
  1. Add one more check so you can see the cache hit behavior during development. In production you would replace this with metrics, but logging is enough to confirm that repeated inputs no longer call the embeddings endpoint.
import { existsSync } from "node:fs";
import { loadCache } from "./cache.js";

const cacheBefore = loadCache();

console.log("cache file exists:", existsSync("./embedding-cache.json"));
console.log("cached entries:", Object.keys(cacheBefore).length);

const sampleText = "The deductible resets every calendar year.";
console.log("has sample:", Boolean(cacheBefore[sampleText.trim()]));

Testing It

Run the retrieval script twice with the same inputs. On the first run, AutoGen should call the embeddings endpoint and write embedding-cache.json; on the second run, the same chunks should be served from disk.

If you want to verify it more strictly, delete one cached entry and rerun only that input. You should see exactly one new embedding request instead of recomputing everything.

For production-grade validation, add timing around embedWithCache() and track hit rate. If your corpus is stable, you should see latency drop hard after warm-up.

Next Steps

  • Move the file cache to Redis so multiple agent workers share embeddings
  • Add chunk normalization rules before hashing so whitespace-only changes do not miss the cache
  • Store metadata alongside vectors so you can version prompts, models, and document sources

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides