How to Fix 'memory not persisting in production' in LangGraph (Python)
What this error usually means
If your LangGraph agent works locally but “forgets” state in production, the issue is almost always with checkpointing or thread identity. In practice, you’ll see symptoms like:
- •a new conversation starts on every request
- •
MessagesStateis empty after the first turn - •the graph runs, but prior messages never reappear
The most common runtime clue is that you’re using MemorySaver in a place where it can’t persist across process restarts, or you’re invoking the graph without a stable thread_id.
The Most Common Cause
The #1 cause is this pattern: you created a checkpointer, but your production deployment does not keep process memory alive.
MemorySaver is an in-memory checkpointer. It works for local testing, notebooks, and single-process dev servers. It does not survive container restarts, autoscaling, multiple workers, or serverless cold starts.
Broken vs fixed
| Broken pattern | Right pattern |
|---|---|
Uses MemorySaver() in production | Uses a persistent checkpointer |
No stable thread_id | Passes a consistent thread_id per user/session |
| State disappears after restart | State survives restarts |
# BROKEN: memory only lives inside this Python process
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, MessagesState, START
builder = StateGraph(MessagesState)
builder.add_node("chat", chat_node)
builder.add_edge(START, "chat")
builder.add_edge("chat", END)
graph = builder.compile(checkpointer=MemorySaver())
# This may work locally, then fail in production after restart / scale-out
result = graph.invoke(
{"messages": [{"role": "user", "content": "Hello"}]},
config={"configurable": {"thread_id": "user-123"}}
)
# FIXED: use a persistent checkpointer (example: Postgres)
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.graph import StateGraph, MessagesState, START
builder = StateGraph(MessagesState)
builder.add_node("chat", chat_node)
builder.add_edge(START, "chat")
builder.add_edge("chat", END)
with PostgresSaver.from_conn_string(DB_URL) as checkpointer:
checkpointer.setup()
graph = builder.compile(checkpointer=checkpointer)
result = graph.invoke(
{"messages": [{"role": "user", "content": "Hello"}]},
config={"configurable": {"thread_id": "user-123"}}
)
If you’re deploying to Kubernetes, ECS, Cloud Run, or serverless Python functions, this is usually the whole problem. In-memory state is not persistence.
Other Possible Causes
1) You forgot to pass thread_id
LangGraph uses thread_id to look up prior state. If every request gets a new ID, you’ve effectively told the graph to start fresh every time.
# Broken
graph.invoke(input_data)
# Fixed
graph.invoke(
input_data,
config={"configurable": {"thread_id": "acct-48291"}}
)
Use a real conversation key:
- •user ID
- •account ID
- •case ID
- •session ID from your app layer
2) Your deployment has multiple workers and each one has its own memory
This shows up with Gunicorn/Uvicorn workers or multiple pods. One request hits worker A, the next hits worker B. Each worker has its own MemorySaver, so state appears random.
# Bad for in-memory checkpointing
gunicorn app:app --workers 4
If you must run multiple workers:
- •use a persistent checkpointer like Postgres or Redis-backed storage
- •do not rely on process-local memory for conversation state
3) You are recreating the graph on every request with fresh state assumptions
Rebuilding the graph per request is fine. Rebuilding it with an ephemeral checkpointer is not.
# Broken if paired with MemorySaver and multi-worker deployment
def get_graph():
builder = StateGraph(MessagesState)
builder.add_node("chat", chat_node)
return builder.compile(checkpointer=MemorySaver())
Better:
# Keep durable storage outside the request path
checkpointer = PostgresSaver.from_conn_string(DB_URL)
checkpointer.setup()
def get_graph():
builder = StateGraph(MessagesState)
builder.add_node("chat", chat_node)
return builder.compile(checkpointer=checkpointer)
4) You are mixing up state and message history
A common mistake is expecting arbitrary Python variables to persist between turns. LangGraph persists checkpointed state, not local variables inside your node function.
# Broken: local variable resets on every run
def chat_node(state):
seen = []
seen.append(state["messages"][-1].content)
return {"messages": []}
Store what matters in graph state:
# Fixed: use the graph state itself
def chat_node(state):
last_msg = state["messages"][-1].content
return {"messages": [{"role": "assistant", "content": f"You said: {last_msg}"}]}
How to Debug It
- •
Check whether you are using
MemorySaver- •If yes and this is production, assume that’s the bug.
- •Search for:
- •
from langgraph.checkpoint.memory import MemorySaver - •
checkpointer=MemorySaver()
- •
- •
Verify that every call includes the same
thread_id- •Log it at the API boundary.
- •If it changes between requests, LangGraph will load a different thread checkpoint.
- •
Inspect whether your app runs with more than one process
- •Look at:
- •Gunicorn workers
- •Uvicorn workers
- •Kubernetes replicas
- •serverless invocations
- •If yes and you’re using in-memory storage, persistence will fail by design.
- •Look at:
- •
Read back checkpoints directly
- •With persistent storage, confirm data exists after each turn.
- •If nothing is stored, your issue is wiring.
- •If data exists but isn’t loaded, your issue is usually
thread_idmismatch.
A useful symptom map:
| Symptom | Likely cause |
|---|---|
| Works locally only | MemorySaver in prod |
| First message persists, second doesn’t | Missing or changing thread_id |
| Random behavior across requests | Multiple workers/pods with local memory |
| Checkpoints exist but aren’t used | Wrong config path or thread identity |
Prevention
- •
Use a persistent checkpointer from day one in any deployed agent:
- •Postgres for standard web apps
- •Redis if you already operate it as shared infra and understand its durability tradeoffs
- •
Treat
thread_idas part of your API contract.- •Generate it once per conversation/session.
- •Never let clients invent new IDs on every request.
- •
Add an integration test that runs two consecutive invocations against the same thread.
- •First turn writes state.
- •Second turn must read it back.
- •Run that test against the same storage backend you use in production.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit