LangGraph Tutorial (Python): caching embeddings for intermediate developers

By Cyprian AaronsUpdated 2026-04-22
langgraphcaching-embeddings-for-intermediate-developerspython

This tutorial shows you how to cache embeddings in a LangGraph pipeline so repeated inputs do not keep hitting your embedding model. You need this when your graph processes duplicate or near-duplicate text, because embeddings are one of the easiest places to burn latency and API cost.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-openai
  • langchain-core
  • openai API key set as OPENAI_API_KEY
  • A working network connection for the first embedding call
  • Basic familiarity with LangGraph StateGraph, nodes, and edges

Install the packages:

pip install langgraph langchain-openai langchain-core openai

Step-by-Step

  1. First, define a tiny cache layer and a state object for the graph. The key idea is simple: hash the input text, look up the embedding in memory, and only call the model on a cache miss.
import hashlib
from typing import TypedDict, Optional

from langchain_openai import OpenAIEmbeddings


class GraphState(TypedDict):
    text: str
    embedding: Optional[list[float]]
    cache_hit: bool


embedding_cache: dict[str, list[float]] = {}
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")


def cache_key(text: str) -> str:
    return hashlib.sha256(text.strip().lower().encode("utf-8")).hexdigest()
  1. Next, write the node that resolves embeddings from cache. This node returns both the embedding and whether it was a hit, which makes testing and observability much easier later.
def get_embedding(state: GraphState) -> dict:
    key = cache_key(state["text"])

    if key in embedding_cache:
        return {
            "embedding": embedding_cache[key],
            "cache_hit": True,
        }

    vector = embeddings.embed_query(state["text"])
    embedding_cache[key] = vector

    return {
        "embedding": vector,
        "cache_hit": False,
    }
  1. Add a second node that uses the cached embedding for downstream work. In real systems this is where you would run similarity search, classification, clustering, or routing logic.
def summarize_embedding(state: GraphState) -> dict:
    vector = state["embedding"] or []
    preview = vector[:5]

    return {
        "embedding": state["embedding"],
        "cache_hit": state["cache_hit"],
        "preview": preview,
    }
  1. Now wire the graph together with LangGraph. This is standard LangGraph syntax: define a state schema, add nodes, connect them with edges, then compile.
from langgraph.graph import StateGraph, START, END


class AppState(TypedDict):
    text: str
    embedding: Optional[list[float]]
    cache_hit: bool
    preview: list[float]


builder = StateGraph(AppState)
builder.add_node("get_embedding", get_embedding)
builder.add_node("summarize_embedding", summarize_embedding)

builder.add_edge(START, "get_embedding")
builder.add_edge("get_embedding", "summarize_embedding")
builder.add_edge("summarize_embedding", END)

graph = builder.compile()
  1. Run it twice with the same input so you can see the cache behavior. The first call should miss and populate the cache; the second call should hit without calling the embedding API again.
if __name__ == "__main__":
    sample_text = "LangGraph caching embeddings is useful for repeated document chunks."

    first = graph.invoke({"text": sample_text})
    second = graph.invoke({"text": sample_text})

    print("First run:", first["cache_hit"], len(first["embedding"]))
    print("Second run:", second["cache_hit"], len(second["embedding"]))
    print("Preview:", second["preview"])
  1. If you want this to survive process restarts, swap the in-memory dictionary for Redis or Postgres-backed storage. The graph code stays almost identical; only the lookup and write functions change.
# Replace embedding_cache with a persistent backend in production.
# Example shape:
#
# def get_from_cache(key: str) -> Optional[list[float]]:
#     ...
#
# def save_to_cache(key: str, vector: list[float]) -> None:
#     ...
#
# Then keep get_embedding() exactly the same structurally.

Testing It

Run the script once and confirm First run: False appears in the output. Run it again with the same text and confirm Second run: True appears.

If you want to be strict, temporarily wrap embeddings.embed_query() with a counter or logger so you can verify it only executes on cache misses. In production, add metrics for cache hit rate and embedding latency so you can tell whether your cache is actually saving money.

Also test whitespace and casing differences if your normalization strategy uses .strip().lower(). If that behavior is too aggressive for your use case, remove it and hash raw text instead.

Next Steps

  • Move the cache to Redis so embeddings survive restarts and scale across workers.
  • Add TTLs and versioned keys so you can invalidate embeddings when models change.
  • Extend the graph with a retrieval node that uses cached embeddings for similarity search or routing.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides