LangGraph Tutorial (Python): caching embeddings for intermediate developers
This tutorial shows you how to cache embeddings in a LangGraph pipeline so repeated inputs do not keep hitting your embedding model. You need this when your graph processes duplicate or near-duplicate text, because embeddings are one of the easiest places to burn latency and API cost.
What You'll Need
- •Python 3.10+
- •
langgraph - •
langchain-openai - •
langchain-core - •
openaiAPI key set asOPENAI_API_KEY - •A working network connection for the first embedding call
- •Basic familiarity with LangGraph
StateGraph, nodes, and edges
Install the packages:
pip install langgraph langchain-openai langchain-core openai
Step-by-Step
- •First, define a tiny cache layer and a state object for the graph. The key idea is simple: hash the input text, look up the embedding in memory, and only call the model on a cache miss.
import hashlib
from typing import TypedDict, Optional
from langchain_openai import OpenAIEmbeddings
class GraphState(TypedDict):
text: str
embedding: Optional[list[float]]
cache_hit: bool
embedding_cache: dict[str, list[float]] = {}
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
def cache_key(text: str) -> str:
return hashlib.sha256(text.strip().lower().encode("utf-8")).hexdigest()
- •Next, write the node that resolves embeddings from cache. This node returns both the embedding and whether it was a hit, which makes testing and observability much easier later.
def get_embedding(state: GraphState) -> dict:
key = cache_key(state["text"])
if key in embedding_cache:
return {
"embedding": embedding_cache[key],
"cache_hit": True,
}
vector = embeddings.embed_query(state["text"])
embedding_cache[key] = vector
return {
"embedding": vector,
"cache_hit": False,
}
- •Add a second node that uses the cached embedding for downstream work. In real systems this is where you would run similarity search, classification, clustering, or routing logic.
def summarize_embedding(state: GraphState) -> dict:
vector = state["embedding"] or []
preview = vector[:5]
return {
"embedding": state["embedding"],
"cache_hit": state["cache_hit"],
"preview": preview,
}
- •Now wire the graph together with LangGraph. This is standard LangGraph syntax: define a state schema, add nodes, connect them with edges, then compile.
from langgraph.graph import StateGraph, START, END
class AppState(TypedDict):
text: str
embedding: Optional[list[float]]
cache_hit: bool
preview: list[float]
builder = StateGraph(AppState)
builder.add_node("get_embedding", get_embedding)
builder.add_node("summarize_embedding", summarize_embedding)
builder.add_edge(START, "get_embedding")
builder.add_edge("get_embedding", "summarize_embedding")
builder.add_edge("summarize_embedding", END)
graph = builder.compile()
- •Run it twice with the same input so you can see the cache behavior. The first call should miss and populate the cache; the second call should hit without calling the embedding API again.
if __name__ == "__main__":
sample_text = "LangGraph caching embeddings is useful for repeated document chunks."
first = graph.invoke({"text": sample_text})
second = graph.invoke({"text": sample_text})
print("First run:", first["cache_hit"], len(first["embedding"]))
print("Second run:", second["cache_hit"], len(second["embedding"]))
print("Preview:", second["preview"])
- •If you want this to survive process restarts, swap the in-memory dictionary for Redis or Postgres-backed storage. The graph code stays almost identical; only the lookup and write functions change.
# Replace embedding_cache with a persistent backend in production.
# Example shape:
#
# def get_from_cache(key: str) -> Optional[list[float]]:
# ...
#
# def save_to_cache(key: str, vector: list[float]) -> None:
# ...
#
# Then keep get_embedding() exactly the same structurally.
Testing It
Run the script once and confirm First run: False appears in the output. Run it again with the same text and confirm Second run: True appears.
If you want to be strict, temporarily wrap embeddings.embed_query() with a counter or logger so you can verify it only executes on cache misses. In production, add metrics for cache hit rate and embedding latency so you can tell whether your cache is actually saving money.
Also test whitespace and casing differences if your normalization strategy uses .strip().lower(). If that behavior is too aggressive for your use case, remove it and hash raw text instead.
Next Steps
- •Move the cache to Redis so embeddings survive restarts and scale across workers.
- •Add TTLs and versioned keys so you can invalidate embeddings when models change.
- •Extend the graph with a retrieval node that uses cached embeddings for similarity search or routing.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit