LangGraph Tutorial (Python): caching embeddings for advanced developers

By Cyprian AaronsUpdated 2026-04-22
langgraphcaching-embeddings-for-advanced-developerspython

This tutorial shows you how to cache embeddings inside a LangGraph workflow so repeated queries don’t re-embed the same text. You need this when your graph processes duplicate documents, reruns nodes during retries, or handles high-volume retrieval where embedding calls become a cost and latency bottleneck.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-openai
  • langchain-core
  • openai API key
  • A Redis instance if you want shared cache across processes
  • Basic familiarity with LangGraph nodes, state, and edges

Install the packages:

pip install langgraph langchain-openai langchain-core openai redis

Set your API key:

export OPENAI_API_KEY="your-key"

Step-by-Step

  1. Start by defining a small state model that carries the input text, the embedding vector, and a cache hit flag. In production, keep the cached value outside the graph state if you need persistence across runs; for this tutorial, we’ll use an in-memory cache so the pattern is easy to see.
from typing import TypedDict, Optional, List

class EmbeddingState(TypedDict):
    text: str
    embedding: Optional[List[float]]
    cache_hit: bool
  1. Build an embedding cache wrapper around your model call. The key point is deterministic hashing of normalized text, so whitespace-only differences do not create duplicate entries.
import hashlib
from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")
embedding_cache: dict[str, list[float]] = {}

def cache_key(text: str) -> str:
    normalized = " ".join(text.split()).strip().lower()
    return hashlib.sha256(normalized.encode("utf-8")).hexdigest()

def get_embedding(text: str) -> tuple[list[float], bool]:
    key = cache_key(text)
    if key in embedding_cache:
        return embedding_cache[key], True

    vector = embeddings_model.embed_query(text)
    embedding_cache[key] = vector
    return vector, False
  1. Create a LangGraph node that reads the text from state and writes back the embedding plus whether it came from cache. This keeps caching logic isolated in one node, which makes retries and observability much easier to manage.
from langgraph.graph import StateGraph, START, END

def embed_node(state: EmbeddingState) -> EmbeddingState:
    vector, hit = get_embedding(state["text"])
    return {
        "text": state["text"],
        "embedding": vector,
        "cache_hit": hit,
    }
  1. Wire the node into a simple graph and compile it. This is intentionally minimal: one input node, one output path, no branching yet.
graph_builder = StateGraph(EmbeddingState)
graph_builder.add_node("embed", embed_node)
graph_builder.add_edge(START, "embed")
graph_builder.add_edge("embed", END)

graph = graph_builder.compile()
  1. Run the graph twice with the same input and inspect the result. The first call should populate the cache; the second should reuse it without calling OpenAI again.
input_state = {"text": "LangGraph caching embeddings example"}

first = graph.invoke(input_state)
second = graph.invoke(input_state)

print("First run cache hit:", first["cache_hit"])
print("Second run cache hit:", second["cache_hit"])
print("Embedding length:", len(first["embedding"]))
  1. If you want shared caching across workers or containers, swap the in-memory dict for Redis. This is the version you want when your app runs behind a queue or multiple API replicas.
import os
import redis

redis_client = redis.Redis.from_url(os.getenv("REDIS_URL", "redis://localhost:6379/0"))

def get_embedding_redis(text: str) -> tuple[list[float], bool]:
    key = f"emb:{cache_key(text)}"
    cached = redis_client.get(key)
    if cached:
        return eval(cached.decode("utf-8")), True

    vector = embeddings_model.embed_query(text)
    redis_client.set(key, repr(vector))
    return vector, False

Testing It

Run the script once and confirm that both invocations return a valid embedding list. The first run should print False for cache_hit, and the second should print True.

To verify it is actually saving money and time, add timing around each graph.invoke() call and compare first-run latency with second-run latency. If you switch to Redis, restart your Python process and confirm the second process still gets a cache hit for identical text.

For more confidence, test normalization by sending "LangGraph caching embeddings example" and " LangGraph caching embeddings example "; both should resolve to the same cache key. If they do not, fix normalization before adding this to production.

Next Steps

  • Add TTL-based eviction so stale embeddings expire automatically.
  • Move from in-memory/Redis caching to a versioned semantic cache keyed by model name plus normalization rules.
  • Extend the graph with retrieval nodes that reuse cached embeddings for document chunks and user queries separately.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides