How to Fix 'state not updating when scaling' in LangChain (Python)
If you’re seeing state not updating when scaling in a LangChain Python app, the usual meaning is simple: your chain or agent works on one process, but the state stops being shared once you add concurrency, workers, or horizontal scaling. In practice, this shows up with message history, conversation memory, counters, or tool state that looks fine locally and then resets or diverges under load.
This is almost always a state persistence problem, not a model problem. The failure mode usually appears when using ConversationBufferMemory, in-memory dicts, or per-process globals with multiple workers.
The Most Common Cause
The #1 cause is keeping agent or conversation state in process memory instead of a shared backend.
That works in a single Python process. It breaks as soon as you run multiple Uvicorn workers, Celery tasks, Kubernetes replicas, or any setup where requests can land on different processes.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Stores memory in local Python objects | Stores memory in Redis/Postgres/DB-backed history |
| Works on one worker | Works across scaled workers |
| State disappears after restart | State survives restarts |
# BROKEN: state lives only in this process
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()
chain = ConversationChain(
llm=llm,
memory=memory,
verbose=True,
)
print(chain.predict(input="My name is Alice"))
print(chain.predict(input="What's my name?"))
# FIXED: persist chat history outside the process
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain.memory import ConversationBufferMemory
llm = ChatOpenAI(model="gpt-4o-mini")
history = RedisChatMessageHistory(
url="redis://localhost:6379/0",
session_id="customer-123"
)
memory = ConversationBufferMemory(
chat_memory=history,
return_messages=True,
)
chain = ConversationChain(
llm=llm,
memory=memory,
verbose=True,
)
print(chain.predict(input="My name is Alice"))
print(chain.predict(input="What's my name?"))
If you’re using RunnableWithMessageHistory, the same rule applies: the history backend must be shared and keyed correctly per session.
Other Possible Causes
1) Wrong session keying
If every request gets a new session_id, LangChain will look like it’s “not updating” because it’s writing to a brand-new conversation each time.
# BAD: random session id per request
session_id = str(uuid.uuid4())
# GOOD: stable session id from authenticated user / tenant / conversation
session_id = f"{tenant_id}:{user_id}:{conversation_id}"
For multi-tenant systems, include tenant scope. Otherwise one customer can read another customer’s history if IDs collide.
2) Using deprecated memory classes incorrectly
A lot of older examples use ConversationBufferMemory directly inside chains. That still works in some cases, but it becomes fragile with modern Runnable patterns and distributed deployments.
# Fragile pattern for scaled apps
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
Prefer explicit message history storage and request-scoped loading/saving. If you are on newer LangChain versions, move toward RunnableWithMessageHistory and an external store.
3) Parallel requests overwriting each other
Two requests for the same session can race. One writes history after the other has already loaded stale state, so the final transcript looks incomplete.
# Two concurrent calls with same session_id can interleave badly
await chain.ainvoke({"input": "Update address"}, config={"configurable": {"session_id": "cust-1"}})
await chain.ainvoke({"input": "Change phone number"}, config={"configurable": {"session_id": "cust-1"}})
Fix this with:
- •per-session locking
- •optimistic concurrency control
- •atomic append operations in your history store
Redis list appends are better than read-modify-write dict updates.
4) Worker-local caches or globals
This one is common in FastAPI + LangChain apps.
# BAD: global mutable state per worker
chat_state = {}
def save_state(session_id: str, value: str):
chat_state[session_id] = value
Each Gunicorn/Uvicorn worker has its own copy. The request may hit worker A first and worker B next, so the second request sees empty state.
Use Redis, Postgres, DynamoDB, or another shared store instead of module-level globals.
How to Debug It
- •
Print the session identifier on every request
- •Confirm it is stable across turns.
- •If it changes between requests, you found the bug.
- •
Log where history is stored
- •If you see
ConversationBufferMemory()with no external backend, assume it will fail under scale. - •Check whether the store is Redis/Postgres/in-memory.
- •If you see
- •
Run two requests against different workers
- •In Kubernetes or Gunicorn with multiple workers, send turn 1 and turn 2.
- •If turn 2 forgets turn 1 only sometimes, you likely have process-local state.
- •
Inspect LangChain traces
- •Turn on verbose logging.
- •Watch for messages like:
- •
Loaded chat history for session_id=... - •
Saving context - •Missing saves usually mean your callback/history hook is not wired correctly.
- •
A useful quick test:
print("session_id:", session_id)
print("history backend:", type(history).__name__)
print("messages:", len(history.messages))
If messages resets unexpectedly between calls, your backend or keying is wrong.
Prevention
- •
Use a shared persistence layer for all conversational state:
- •Redis for short-lived chat history
- •Postgres for durable audit-grade storage
- •
Treat
session_idas part of your application contract:- •stable
- •namespaced by tenant/user/conversation
- •never random per request unless that is intentional
- •
Avoid module globals and process-local caches for anything user-facing:
- •they pass local tests
- •they fail under horizontal scaling
If you want one rule to keep in mind: LangChain chains are stateless unless you make state explicit. Once you move memory out of process and key it correctly, this error usually disappears fast.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit