LangChain Tutorial (TypeScript): caching embeddings for beginners

By Cyprian AaronsUpdated 2026-04-21
langchaincaching-embeddings-for-beginnerstypescript

This tutorial shows you how to cache embeddings in a TypeScript LangChain app so repeated requests for the same text stop burning API calls and latency. You need this when you re-index documents often, rerun tests, or have users uploading duplicate content.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or tsx
  • An OpenAI API key
  • These packages:
    • langchain
    • @langchain/openai
    • @langchain/community
    • typescript
    • tsx or ts-node

Install them like this:

npm install langchain @langchain/openai @langchain/community
npm install -D typescript tsx @types/node

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a plain embedding model.
    This is the baseline: every call hits the provider, which is exactly what we want to avoid once caching is added.
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

async function main() {
  const vectors = await embeddings.embedDocuments([
    "LangChain caching example",
    "LangChain caching example",
  ]);

  console.log(vectors.length);
  console.log(vectors[0].slice(0, 5));
}

main().catch(console.error);
  1. Add a cache-backed store for embeddings.
    LangChain exposes a cache interface through its community package. For beginners, an in-memory cache is the simplest way to prove the pattern before moving to Redis or another shared store.
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryCache } from "@langchain/community/caches/memory";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

embeddings.setCache(new MemoryCache());

async function main() {
  const text = "cached embedding demo";

  const first = await embeddings.embedQuery(text);
  const second = await embeddings.embedQuery(text);

  console.log(first.slice(0, 5));
  console.log(second.slice(0, 5));
}

main().catch(console.error);
  1. Wrap the embedding model with a stable cache key strategy.
    The default cache behavior works for repeated identical strings, but in real systems you usually want control over normalization. Trimming whitespace and lowercasing before embedding avoids accidental misses.
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryCache } from "@langchain/community/caches/memory";

const cache = new MemoryCache();

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
embeddings.setCache(cache);

function normalizeText(text: string) {
  return text.trim().toLowerCase();
}

async function cachedEmbed(text: string) {
  return embeddings.embedQuery(normalizeText(text));
}

async function main() {
  const a = await cachedEmbed("Hello World ");
  const b = await cachedEmbed("hello world");

  console.log(a.length);
  console.log(b.length);
}

main().catch(console.error);
  1. Use the cached embedder inside a document pipeline.
    This is where caching pays off: chunking and reprocessing the same docs will reuse stored vectors instead of recomputing them every run.
import { Document } from "@langchain/core/documents";
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryCache } from "@langchain/community/caches/memory";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
embeddings.setCache(new MemoryCache());

const docs = [
  new Document({ pageContent: "Policy terms and conditions." }),
  new Document({ pageContent: "Policy terms and conditions." }),
];

async function main() {
  for (const doc of docs) {
    const vector = await embeddings.embedQuery(doc.pageContent);
    console.log(doc.pageContent, vector.slice(0, 3));
  }
}

main().catch(console.error);
  1. Move to a shared cache when you have multiple processes.
    In-memory caching only helps inside one Node process, so it resets on restart and does nothing across workers. In production, use Redis or another external store with the same pattern.
import { OpenAIEmbeddings } from "@langchain/openai";
import { RedisCache } from "@langchain/community/caches/ioredis";
import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL ?? "redis://localhost:6379");

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

embeddings.setCache(new RedisCache(redis));

async function main() {
  const text = "shared cache demo";
  const vector = await embeddings.embedQuery(text);

  console.log(vector.length);
}

main().catch(console.error);

Testing It

Run the same script twice with the exact same input and watch the second call return immediately if your cache is working locally. With MemoryCache, restart the process and confirm the cache is gone; that tells you it’s only process-local, which is expected.

If you switch to Redis, run two separate Node processes against the same text and verify both get a result while only one should need to compute it initially. In practice, add logging around your embedding calls so you can see whether your app is hitting the provider or serving from cache.

A good smoke test is to embed duplicate strings in a loop and compare latency between the first pass and subsequent passes. If your cache is wired correctly, repeated inputs should become much cheaper than unique ones.

Next Steps

  • Replace MemoryCache with Redis in a real service so multiple workers share the same embedding results.
  • Add normalization rules for whitespace, casing, and punctuation before caching.
  • Cache chunk hashes instead of raw text if you’re building a document ingestion pipeline.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides