LangGraph Tutorial (TypeScript): caching embeddings for advanced developers

By Cyprian AaronsUpdated 2026-04-22

langgraphcaching-embeddings-for-advanced-developerstypescript

This tutorial shows how to cache embeddings inside a LangGraph workflow in TypeScript, so repeated document chunks do not get re-embedded on every run. You need this when your graph processes the same inputs often, because embedding calls are expensive, slow, and easy to duplicate if you do not persist them.

What You'll Need

•Node.js 18+
•TypeScript 5+
•@langchain/langgraph
•@langchain/openai
•@langchain/core
•zod
•An OpenAI API key set as OPENAI_API_KEY
•A project with "type": "module" in package.json or equivalent ESM support

Step-by-Step

•Start by defining a small cache layer that keys embeddings by stable content. In production, this should be Redis, Postgres, or a vector-store-adjacent metadata table; for this tutorial, an in-memory map keeps the example executable.

import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
const embeddingCache = new Map<string, number[]>();

export async function getCachedEmbedding(text: string): Promise<number[]> {
  const key = text.trim().toLowerCase();
  const cached = embeddingCache.get(key);
  if (cached) return cached;

  const vector = await embeddings.embedQuery(text);
  embeddingCache.set(key, vector);
  return vector;
}

•Define the graph state with the original text, the computed embedding, and a similarity score. LangGraph works best when state is explicit and serializable.

import { z } from "zod";

export const StateSchema = z.object({
  input: z.string(),
  candidate: z.string(),
  inputEmbedding: z.array(z.number()).optional(),
  candidateEmbedding: z.array(z.number()).optional(),
  similarity: z.number().optional(),
});

export type GraphState = z.infer<typeof StateSchema>;

•Add nodes that fetch cached embeddings and compute cosine similarity. The important part is that the embedding node is deterministic: same text in, same cache key out.

import { cosineSimilarity } from "@langchain/core/utils/math";

export async function embedInput(state: GraphState): Promise<Partial<GraphState>> {
  return { inputEmbedding: await getCachedEmbedding(state.input) };
}

export async function embedCandidate(state: GraphState): Promise<Partial<GraphState>> {
  return { candidateEmbedding: await getCachedEmbedding(state.candidate) };
}

export async function scoreSimilarity(state: GraphState): Promise<Partial<GraphState>> {
  if (!state.inputEmbedding || !state.candidateEmbedding) {
    throw new Error("Missing embeddings");
  }
  return {
    similarity: cosineSimilarity(state.inputEmbedding, state.candidateEmbedding),
  };
}

•Wire those nodes into a LangGraph workflow. This graph runs both embedding steps in parallel, then joins them before scoring.

import { StateGraph, START, END } from "@langchain/langgraph";

const graph = new StateGraph(StateSchema)
  .addNode("embedInput", embedInput)
  .addNode("embedCandidate", embedCandidate)
  .addNode("scoreSimilarity", scoreSimilarity)
  .addEdge(START, "embedInput")
  .addEdge(START, "embedCandidate")
  .addEdge("embedInput", "scoreSimilarity")
  .addEdge("embedCandidate", "scoreSimilarity")
  .addEdge("scoreSimilarity", END);

export const app = graph.compile();

•Run the graph with repeated inputs to confirm the cache is doing work. On the second run with the same strings, no new embedding call should happen for those texts.

async function main() {
  const first = await app.invoke({
    input: "KYC document verification",
    candidate: "identity verification for onboarding",
  });

  const second = await app.invoke({
    input: "KYC document verification",
    candidate: "identity verification for onboarding",
  });

  console.log("First:", first.similarity);
  console.log("Second:", second.similarity);
}

main().catch(console.error);

Testing It

Run the script twice and watch the behavior of your cache layer. On the first execution, both texts should trigger embedQuery; on subsequent executions within the same process, they should hit embeddingCache instead.

If you want stronger proof, add a log line inside getCachedEmbedding when a cache hit occurs. In a real service, replace the Map with Redis and verify hit rate under repeated traffic patterns like duplicate policy clauses or repeated claim notes.

A good sanity check is to compare similarity scores across runs. They should remain stable for identical inputs because embeddings are deterministic for a fixed model and text normalization strategy.

Next Steps

•Move the cache behind Redis with TTLs and per-tenant namespaces
•Add batch embedding for multiple chunks before they enter the graph
•Persist chunk hashes alongside vector IDs so you can invalidate stale embeddings when source documents change

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit