LangGraph Tutorial (Python): caching embeddings for beginners
This tutorial shows you how to build a small LangGraph workflow in Python that caches text embeddings so repeated inputs do not hit the embedding API again. You need this when the same documents, prompts, or chunks are embedded repeatedly and you want lower latency, fewer API calls, and more predictable costs.
What You'll Need
- •Python 3.10+
- •A working OpenAI API key
- •
langgraph - •
langchain-openai - •
langchain-core - •
python-dotenvif you want to load secrets from a.envfile
Install the packages:
pip install langgraph langchain-openai langchain-core python-dotenv
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a tiny embedding cache.
For beginners, an in-memory dictionary is enough to prove the pattern. In production you would swap this for Redis, Postgres, or a key-value store.
from typing import TypedDict
class State(TypedDict):
text: str
embedding: list[float] | None
cache_hit: bool
embedding_cache: dict[str, list[float]] = {}
- •Create a node that checks the cache before calling the model.
The key idea is simple: use the input text as the cache key, return immediately on a hit, and only call the embedding model on a miss.
from langchain_openai import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")
def get_embedding(state: State) -> State:
text = state["text"]
if text in embedding_cache:
return {"text": text, "embedding": embedding_cache[text], "cache_hit": True}
vector = embeddings_model.embed_query(text)
embedding_cache[text] = vector
return {"text": text, "embedding": vector, "cache_hit": False}
- •Build a LangGraph with one node and an entry point.
This is enough to learn the pattern before adding routing or downstream retrieval logic. LangGraph still gives you a clean stateful execution model even for a small workflow like this.
from langgraph.graph import StateGraph, END
graph_builder = StateGraph(State)
graph_builder.add_node("embed", get_embedding)
graph_builder.set_entry_point("embed")
graph_builder.add_edge("embed", END)
app = graph_builder.compile()
- •Run the graph twice with the same input and compare results.
The first call should populate the cache. The second call should return the same vector but mark it as a cache hit.
input_state = {"text": "LangGraph caching embeddings", "embedding": None, "cache_hit": False}
first_run = app.invoke(input_state)
second_run = app.invoke(input_state)
print("First run cache hit:", first_run["cache_hit"])
print("Second run cache hit:", second_run["cache_hit"])
print("Same embedding:", first_run["embedding"] == second_run["embedding"])
- •Make the cache key safer for real usage.
Raw text keys work for demos, but production systems often normalize whitespace and hash the content. That avoids huge dictionary keys and makes storage cleaner.
import hashlib
def make_cache_key(text: str) -> str:
normalized = " ".join(text.split()).strip().lower()
return hashlib.sha256(normalized.encode("utf-8")).hexdigest()
def get_embedding_hashed(state: State) -> State:
text = state["text"]
key = make_cache_key(text)
if key in embedding_cache:
return {"text": text, "embedding": embedding_cache[key], "cache_hit": True}
vector = embeddings_model.embed_query(text)
embedding_cache[key] = vector
return {"text": text, "embedding": vector, "cache_hit": False}
- •Swap in persistent storage when you move past local testing.
The LangGraph code does not change much; only your cache layer changes. That is what makes this pattern useful: your graph stays stable while storage evolves underneath it.
# Replace this dict with Redis/Postgres/etc.
# Keep the same get/set interface.
class SimpleCache:
def __init__(self):
self.store: dict[str, list[float]] = {}
def get(self, key: str):
return self.store.get(key)
def set(self, key: str, value: list[float]):
self.store[key] = value
cache = SimpleCache()
Testing It
Run the script once and confirm the first invocation prints False for cache_hit while the second prints True. If both runs show False, your cache lookup is failing; if both show True, you may be reusing state incorrectly.
You should also verify that both returned embeddings are identical for the same input text. If you change one word in the input, you should get a different cache key and a new embedding call.
For extra confidence, add logging around the cache branch so you can see whether each request was served from memory or fetched from OpenAI.
Next Steps
- •Replace the in-memory dict with Redis and keep the same graph node logic.
- •Add a retriever node after embeddings so cached vectors feed similarity search.
- •Use LangGraph checkpoints if you want to persist workflow state across runs instead of just caching embeddings.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit