How to Fix 'memory not persisting when scaling' in LangGraph (Python)
When LangGraph memory stops persisting after you scale from one process to multiple workers, the problem is usually not the graph logic itself. It’s almost always a checkpointing or thread identity issue: each worker is writing to its own local state, or your app is starting a fresh thread on every request.
You’ll typically see behavior like this:
- •First message in a conversation works
- •Second request comes back “blank” or forgets prior state
- •It works on one pod, then fails once you add more replicas
The Most Common Cause
The #1 cause is using an in-memory checkpointer, or creating a new checkpointer per process. MemorySaver and other local-memory patterns work in a single Python process, but they do not persist across workers, pods, or restarts.
Here’s the broken pattern versus the fixed pattern.
| Broken | Fixed |
|---|---|
MemorySaver() inside the app process | Shared persistent checkpointer like Postgres/Redis |
| New graph/checkpointer per worker | One durable store used by all workers |
No stable thread_id | Same thread_id for the same conversation |
# BROKEN: state lives only in this Python process
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, START, END
checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)
# Every worker has its own isolated memory.
result = graph.invoke(
{"messages": [("user", "hello")]},
config={"configurable": {"thread_id": "abc123"}}
)
# FIXED: use a persistent checkpointer shared across workers
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.graph import StateGraph
DB_URI = "postgresql://user:pass@postgres:5432/langgraph"
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
graph = builder.compile(checkpointer=checkpointer)
result = graph.invoke(
{"messages": [("user", "hello")]},
config={"configurable": {"thread_id": "abc123"}}
)
If you’re deploying behind Gunicorn, Uvicorn workers, Kubernetes, ECS, or Cloud Run, this is the first thing to fix. A MemorySaver checkpoint stored in worker A will never be visible to worker B.
Other Possible Causes
1) You are not passing a stable thread_id
LangGraph uses thread_id to load the correct checkpoint. If you generate a new ID every request, persistence will look broken even with a real database.
# BROKEN: new thread every request
graph.invoke(
input_data,
config={"configurable": {"thread_id": str(uuid.uuid4())}}
)
# FIXED: reuse the same conversation/thread identifier
graph.invoke(
input_data,
config={"configurable": {"thread_id": conversation_id}}
)
2) You compiled the graph inside the request handler
Compiling per request can hide state bugs and create inconsistent runtime behavior. The graph should usually be built once at startup.
# BROKEN
@app.post("/chat")
def chat(req: ChatRequest):
graph = builder.compile(checkpointer=checkpointer)
return graph.invoke(req.input, config=req.config)
# FIXED
graph = builder.compile(checkpointer=checkpointer)
@app.post("/chat")
def chat(req: ChatRequest):
return graph.invoke(req.input, config=req.config)
3) Your database is not actually shared by all replicas
This shows up when each container points to localhost or an ephemeral volume. One pod writes checkpoints; another pod reads from a different place.
# BROKEN Kubernetes env example
env:
- name: DATABASE_URL
value: postgresql://user:pass@localhost:5432/langgraph
# FIXED Kubernetes env example
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: langgraph-db-secret
key: DATABASE_URL
If your URL says localhost, assume it is wrong unless Postgres is running in the same container.
4) You are mixing sync and async incorrectly
If you use async graphs with sync invocations, or forget to await async persistence calls in surrounding code, checkpoints may never flush when expected.
# BROKEN: async graph path used incorrectly elsewhere in app
result = await graph.ainvoke(input_data, config=config)
# FIXED: keep invocation style consistent end-to-end
result = await graph.ainvoke(input_data, config=config)
# and ensure your route/function is async too
Also make sure your checkpointer supports the async path you’re using. Don’t mix APIs casually.
How to Debug It
- •
Confirm which checkpointer you compiled with
- •Log the class name at startup.
- •If you see
MemorySaver, that’s your answer. - •Example:
print(type(checkpointer).__name__)
- •
Verify the same
thread_idis reused- •Log it on every request.
- •Send two requests with the exact same ID and compare results.
- •If state resets between requests, your client is generating new IDs.
- •
Check whether all workers point at the same store
- •Inspect environment variables on every replica.
- •Confirm DB hostnames are identical and reachable.
- •In Kubernetes, exec into two pods and compare
DATABASE_URL.
- •
Inspect checkpoints directly
- •Query your backing store for saved threads/checkpoints.
- •If nothing is being written, your persistence layer isn’t configured correctly.
- •If writes exist but reads don’t match, your thread mapping is wrong.
Prevention
- •
Use a durable checkpointer in any multi-worker deployment:
- •
PostgresSaver - •Redis-based persistence if that fits your architecture better
- •
- •
Treat
thread_idas part of your API contract:- •stable per user session/conversation
- •never random per request
- •
Initialize LangGraph once at process startup:
- •build nodes once
- •compile once
- •reuse across requests
If you want one sentence to remember this by: LangGraph memory only persists across scale when both the checkpointer and thread_id are stable across workers.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit