LangGraph Tutorial (Python): caching embeddings for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphcaching-embeddings-for-beginnerspython

This tutorial shows you how to build a small LangGraph workflow in Python that caches text embeddings so repeated inputs do not hit the embedding API again. You need this when the same documents, prompts, or chunks are embedded repeatedly and you want lower latency, fewer API calls, and more predictable costs.

What You'll Need

  • Python 3.10+
  • A working OpenAI API key
  • langgraph
  • langchain-openai
  • langchain-core
  • python-dotenv if you want to load secrets from a .env file

Install the packages:

pip install langgraph langchain-openai langchain-core python-dotenv

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a tiny embedding cache.
    For beginners, an in-memory dictionary is enough to prove the pattern. In production you would swap this for Redis, Postgres, or a key-value store.
from typing import TypedDict

class State(TypedDict):
    text: str
    embedding: list[float] | None
    cache_hit: bool

embedding_cache: dict[str, list[float]] = {}
  1. Create a node that checks the cache before calling the model.
    The key idea is simple: use the input text as the cache key, return immediately on a hit, and only call the embedding model on a miss.
from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")

def get_embedding(state: State) -> State:
    text = state["text"]

    if text in embedding_cache:
        return {"text": text, "embedding": embedding_cache[text], "cache_hit": True}

    vector = embeddings_model.embed_query(text)
    embedding_cache[text] = vector
    return {"text": text, "embedding": vector, "cache_hit": False}
  1. Build a LangGraph with one node and an entry point.
    This is enough to learn the pattern before adding routing or downstream retrieval logic. LangGraph still gives you a clean stateful execution model even for a small workflow like this.
from langgraph.graph import StateGraph, END

graph_builder = StateGraph(State)
graph_builder.add_node("embed", get_embedding)
graph_builder.set_entry_point("embed")
graph_builder.add_edge("embed", END)

app = graph_builder.compile()
  1. Run the graph twice with the same input and compare results.
    The first call should populate the cache. The second call should return the same vector but mark it as a cache hit.
input_state = {"text": "LangGraph caching embeddings", "embedding": None, "cache_hit": False}

first_run = app.invoke(input_state)
second_run = app.invoke(input_state)

print("First run cache hit:", first_run["cache_hit"])
print("Second run cache hit:", second_run["cache_hit"])
print("Same embedding:", first_run["embedding"] == second_run["embedding"])
  1. Make the cache key safer for real usage.
    Raw text keys work for demos, but production systems often normalize whitespace and hash the content. That avoids huge dictionary keys and makes storage cleaner.
import hashlib

def make_cache_key(text: str) -> str:
    normalized = " ".join(text.split()).strip().lower()
    return hashlib.sha256(normalized.encode("utf-8")).hexdigest()

def get_embedding_hashed(state: State) -> State:
    text = state["text"]
    key = make_cache_key(text)

    if key in embedding_cache:
        return {"text": text, "embedding": embedding_cache[key], "cache_hit": True}

    vector = embeddings_model.embed_query(text)
    embedding_cache[key] = vector
    return {"text": text, "embedding": vector, "cache_hit": False}
  1. Swap in persistent storage when you move past local testing.
    The LangGraph code does not change much; only your cache layer changes. That is what makes this pattern useful: your graph stays stable while storage evolves underneath it.
# Replace this dict with Redis/Postgres/etc.
# Keep the same get/set interface.

class SimpleCache:
    def __init__(self):
        self.store: dict[str, list[float]] = {}

    def get(self, key: str):
        return self.store.get(key)

    def set(self, key: str, value: list[float]):
        self.store[key] = value

cache = SimpleCache()

Testing It

Run the script once and confirm the first invocation prints False for cache_hit while the second prints True. If both runs show False, your cache lookup is failing; if both show True, you may be reusing state incorrectly.

You should also verify that both returned embeddings are identical for the same input text. If you change one word in the input, you should get a different cache key and a new embedding call.

For extra confidence, add logging around the cache branch so you can see whether each request was served from memory or fetched from OpenAI.

Next Steps

  • Replace the in-memory dict with Redis and keep the same graph node logic.
  • Add a retriever node after embeddings so cached vectors feed similarity search.
  • Use LangGraph checkpoints if you want to persist workflow state across runs instead of just caching embeddings.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides