How to Fix 'memory not persisting when scaling' in LangGraph (TypeScript)
When LangGraph memory stops persisting after you scale from one Node process to multiple replicas, the issue is almost always not “memory” itself. It’s usually your checkpointer, thread IDs, or deployment topology.
This shows up when a graph works locally, then loses conversation state behind a load balancer, Kubernetes deployment, serverless runtime, or horizontal autoscaling setup.
The Most Common Cause
The #1 cause is using an in-memory checkpointer like MemorySaver in a multi-instance deployment. MemorySaver keeps state inside the current process, so the next request may hit a different pod and see an empty store.
You’ll often see behavior like:
- •first turn works
- •second turn forgets previous messages
- •logs show
thread_idchanging or missing - •no explicit exception, just state reset
Here’s the broken pattern versus the correct pattern.
| Broken | Fixed |
|---|---|
Uses MemorySaver in production | Uses a shared persistent checkpointer |
| Works on one local process only | Survives scaling and pod restarts |
| State disappears across replicas | State is stored in Redis/Postgres/etc. |
// ❌ Broken: MemorySaver only persists inside one process
import { StateGraph } from "@langchain/langgraph";
import { MemorySaver } from "@langchain/langgraph/checkpoint/memory";
const checkpointer = new MemorySaver();
const graph = new StateGraph({
channels: {
messages: {
value: (left: any[], right: any[]) => left.concat(right),
default: () => [],
},
},
})
.addNode("agent", async (state) => {
return { messages: [{ role: "assistant", content: "ok" }] };
})
.addEdge("__start__", "agent")
.addEdge("agent", "__end__")
.compile({ checkpointer });
// This may work locally, then fail to persist across replicas
await graph.invoke(
{ messages: [{ role: "user", content: "hello" }] },
{ configurable: { thread_id: "user-123" } }
);
// ✅ Fixed: use a shared durable checkpointer
import { StateGraph } from "@langchain/langgraph";
import { PostgresSaver } from "@langchain/langgraph/checkpoint/postgres";
// or Redis-based persistence if that matches your stack
const checkpointer = await PostgresSaver.fromConnString(process.env.POSTGRES_URL!);
await checkpointer.setup();
const graph = new StateGraph({
channels: {
messages: {
value: (left: any[], right: any[]) => left.concat(right),
default: () => [],
},
},
})
.addNode("agent", async (state) => {
return { messages: [{ role: "assistant", content: "ok" }] };
})
.addEdge("__start__", "agent")
.addEdge("agent", "__end__")
.compile({ checkpointer });
await graph.invoke(
{ messages: [{ role: "user", content: "hello" }] },
{ configurable: { thread_id: "user-123" } }
);
If you’re using MemorySaver, it is not a scaling solution. It is a local dev solution.
Other Possible Causes
Missing or unstable thread_id
LangGraph checkpoints are keyed by thread identity. If you generate a new ID per request, memory will look like it is not persisting.
// ❌ Broken
await graph.invoke(input, {
configurable: { thread_id: crypto.randomUUID() },
});
// ✅ Fixed
await graph.invoke(input, {
configurable: { thread_id: `customer:${customerId}` },
});
Use a stable business key:
- •customer ID
- •session ID
- •case ID
- •conversation ID
Your load balancer is sending requests to different pods with no shared store
If each replica has its own local filesystem or in-memory state, scaling breaks persistence even if your code looks correct.
# ❌ Broken for memory persistence
replicas: 3
env:
- name: CHECKPOINTER_TYPE
value: memory
Fix it by using shared infrastructure:
# ✅ Shared persistence backend
env:
- name: POSTGRES_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: postgres-url
You are re-compiling the graph incorrectly on every request
If you create a fresh graph and fresh checkpointer per request, you can accidentally bypass previously saved state or connect to the wrong store.
// ❌ Broken pattern inside handler
app.post("/chat", async (req, res) => {
const checkpointer = new MemorySaver();
const graph = buildGraph().compile({ checkpointer });
const result = await graph.invoke(req.body, {
configurable: { thread_id: req.body.threadId },
});
});
// ✅ Better pattern at process startup
const checkpointer = await PostgresSaver.fromConnString(process.env.POSTGRES_URL!);
await checkpointer.setup();
const graph = buildGraph().compile({ checkpointer });
app.post("/chat", async (req, res) => {
const result = await graph.invoke(req.body, {
configurable: { thread_id: req.body.threadId },
});
});
Your state schema does not actually merge memory fields
A common TypeScript mistake is overwriting state instead of merging it. In LangGraph terms, your reducer must preserve previous values.
// ❌ Broken reducer overwrites history
messages: {
value: (_left, right) => right,
}
// ✅ Fixed reducer appends history
messages: {
value: (left, right) => [...left, ...right],
}
If your reducer drops prior values, the checkpoint may exist but your conversation still looks empty.
How to Debug It
- •
Confirm whether the problem happens only after scaling
- •Run one replica locally.
- •Then run two replicas behind the same entry point.
- •If it breaks only with multiple instances, suspect
MemorySaveror local-only storage first.
- •
Log the
thread_idon every request- •Print it before invoking the graph.
- •Make sure the same conversation always uses the same ID.
- •If it changes between turns, LangGraph will treat each turn as a new thread.
- •
Inspect which checkpointer you compiled with
- •Search for
compile({ checkpointer }). - •If you see
new MemorySaver(), that is your answer for production scaling. - •If using Postgres/Redis already, verify connectivity and initialization with
setup()where required.
- •Search for
- •
Read back the checkpoint directly
- •Query your backing store for the same
thread_id. - •If there is no row/document/key after invocation, persistence never happened.
- •If there is data but the app still forgets context, your retrieval path or reducer is wrong.
- •Query your backing store for the same
Prevention
- •Use a durable shared checkpointer in any environment with more than one instance.
- •Treat
thread_idas part of your domain model, not an implementation detail. - •Add an integration test that runs two separate app processes against the same conversation and verifies state survives across requests.
- •Never ship
MemorySaveroutside local development unless you explicitly want ephemeral state.
If you want one sentence to remember this by:
LangGraph memory does not persist across scaling unless both your checkpoint store and your thread identity are stable.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit