LangGraph Tutorial (TypeScript): caching embeddings for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphcaching-embeddings-for-beginnerstypescript

This tutorial shows you how to cache embeddings in a LangGraph TypeScript workflow so repeated inputs do not keep paying the cost of re-embedding the same text. You need this when your app processes duplicate queries, document chunks, or customer messages and you want lower latency and lower embedding API spend.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • Packages:
    • @langchain/langgraph
    • @langchain/openai
    • @langchain/core
    • dotenv
  • An OpenAI API key in .env
  • Basic familiarity with LangGraph nodes and edges
  • A place to store cache data:
    • for this tutorial, a local in-memory Map
    • in production, swap this for Redis, Postgres, or another shared store

Step-by-Step

  1. Set up your project and install the dependencies. This example uses OpenAI embeddings and a simple local cache so you can see the pattern without adding infrastructure first.
npm init -y
npm install @langchain/langgraph @langchain/openai @langchain/core dotenv
npm install -D typescript tsx @types/node
  1. Create a small embedding cache wrapper. The important part is the key: normalize the text before hashing it, then store the vector by that key. If the same text comes back later, you return the cached vector instead of calling the embedding model again.
import "dotenv/config";
import crypto from "node:crypto";
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const cache = new Map<string, number[]>();

function cacheKey(text: string): string {
  return crypto.createHash("sha256").update(text.trim().toLowerCase()).digest("hex");
}

export async function embedWithCache(text: string): Promise<number[]> {
  const key = cacheKey(text);
  const cached = cache.get(key);
  if (cached) return cached;

  const vector = await embeddings.embedQuery(text);
  cache.set(key, vector);
  return vector;
}
  1. Build a LangGraph workflow that uses the cached embedding function inside a node. Here we keep the graph small: one node embeds the input text, another node prints metadata so you can confirm when caching is working.
import { StateGraph, Annotation } from "@langchain/langgraph";
import { embedWithCache } from "./embed-cache.js";

const State = Annotation.Root({
  text: Annotation<string>(),
  embedding: Annotation<number[]>(),
});

const graph = new StateGraph(State)
  .addNode("embed", async (state) => {
    const embedding = await embedWithCache(state.text);
    return { embedding };
  })
  .addNode("log", async (state) => {
    console.log(`Embedded "${state.text}" -> ${state.embedding.length} dimensions`);
    return {};
  })
  .addEdge("__start__", "embed")
  .addEdge("embed", "log")
  .addEdge("log", "__end__")
  .compile();

export { graph };
  1. Run the graph twice with the same input. The first run should call OpenAI embeddings and store the result; the second run should hit your local cache immediately.
import { graph } from "./graph.js";

async function main() {
  const input = { text: "LangGraph caching embeddings is useful" };

  console.time("first");
  await graph.invoke(input);
  console.timeEnd("first");

  console.time("second");
  await graph.invoke(input);
  console.timeEnd("second");
}

main().catch(console.error);
  1. Add a tiny test to prove repeated inputs reuse cached vectors. In production you would also track cache hit rate and add eviction, but this check is enough to validate behavior locally.
import assert from "node:assert/strict";
import { embedWithCache } from "./embed-cache.js";

async function testCache() {
  const a = await embedWithCache("Hello world");
  const b = await embedWithCache("Hello world");

  assert.equal(a.length, b.length);
  assert.deepEqual(a, b);

  console.log("cache test passed");
}

testCache().catch(console.error);

Testing It

Run your script with npx tsx src/run.ts or whatever entry file you created. On the first invocation, you should see normal latency from the embedding call; on the second invocation with identical text, it should be noticeably faster because it never leaves your process.

If you want to verify it more directly, add a log inside embedWithCache right before embedQuery and print "cache miss" there. You should only see that message once for repeated inputs.

For real apps, do not rely on an in-memory Map unless you are fine with per-process caching only. If you run multiple Node instances behind a load balancer, each instance gets its own cache unless you move this into Redis or another shared store.

Next Steps

  • Replace the local Map with Redis and use TTL-based eviction
  • Cache embeddings by normalized document chunk ID instead of raw text when indexing files
  • Add observability for cache hit rate, miss rate, and embedding cost per request

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides