How to Fix 'state not updating when scaling' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
state-not-updating-when-scalinglangchainpython

If you’re seeing state not updating when scaling in a LangChain Python app, the usual meaning is simple: your chain or agent works on one process, but the state stops being shared once you add concurrency, workers, or horizontal scaling. In practice, this shows up with message history, conversation memory, counters, or tool state that looks fine locally and then resets or diverges under load.

This is almost always a state persistence problem, not a model problem. The failure mode usually appears when using ConversationBufferMemory, in-memory dicts, or per-process globals with multiple workers.

The Most Common Cause

The #1 cause is keeping agent or conversation state in process memory instead of a shared backend.

That works in a single Python process. It breaks as soon as you run multiple Uvicorn workers, Celery tasks, Kubernetes replicas, or any setup where requests can land on different processes.

Broken vs fixed pattern

Broken patternFixed pattern
Stores memory in local Python objectsStores memory in Redis/Postgres/DB-backed history
Works on one workerWorks across scaled workers
State disappears after restartState survives restarts
# BROKEN: state lives only in this process
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True,
)

print(chain.predict(input="My name is Alice"))
print(chain.predict(input="What's my name?"))
# FIXED: persist chat history outside the process
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(model="gpt-4o-mini")

history = RedisChatMessageHistory(
    url="redis://localhost:6379/0",
    session_id="customer-123"
)

memory = ConversationBufferMemory(
    chat_memory=history,
    return_messages=True,
)

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True,
)

print(chain.predict(input="My name is Alice"))
print(chain.predict(input="What's my name?"))

If you’re using RunnableWithMessageHistory, the same rule applies: the history backend must be shared and keyed correctly per session.

Other Possible Causes

1) Wrong session keying

If every request gets a new session_id, LangChain will look like it’s “not updating” because it’s writing to a brand-new conversation each time.

# BAD: random session id per request
session_id = str(uuid.uuid4())
# GOOD: stable session id from authenticated user / tenant / conversation
session_id = f"{tenant_id}:{user_id}:{conversation_id}"

For multi-tenant systems, include tenant scope. Otherwise one customer can read another customer’s history if IDs collide.

2) Using deprecated memory classes incorrectly

A lot of older examples use ConversationBufferMemory directly inside chains. That still works in some cases, but it becomes fragile with modern Runnable patterns and distributed deployments.

# Fragile pattern for scaled apps
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)

Prefer explicit message history storage and request-scoped loading/saving. If you are on newer LangChain versions, move toward RunnableWithMessageHistory and an external store.

3) Parallel requests overwriting each other

Two requests for the same session can race. One writes history after the other has already loaded stale state, so the final transcript looks incomplete.

# Two concurrent calls with same session_id can interleave badly
await chain.ainvoke({"input": "Update address"}, config={"configurable": {"session_id": "cust-1"}})
await chain.ainvoke({"input": "Change phone number"}, config={"configurable": {"session_id": "cust-1"}})

Fix this with:

  • per-session locking
  • optimistic concurrency control
  • atomic append operations in your history store

Redis list appends are better than read-modify-write dict updates.

4) Worker-local caches or globals

This one is common in FastAPI + LangChain apps.

# BAD: global mutable state per worker
chat_state = {}

def save_state(session_id: str, value: str):
    chat_state[session_id] = value

Each Gunicorn/Uvicorn worker has its own copy. The request may hit worker A first and worker B next, so the second request sees empty state.

Use Redis, Postgres, DynamoDB, or another shared store instead of module-level globals.

How to Debug It

  1. Print the session identifier on every request

    • Confirm it is stable across turns.
    • If it changes between requests, you found the bug.
  2. Log where history is stored

    • If you see ConversationBufferMemory() with no external backend, assume it will fail under scale.
    • Check whether the store is Redis/Postgres/in-memory.
  3. Run two requests against different workers

    • In Kubernetes or Gunicorn with multiple workers, send turn 1 and turn 2.
    • If turn 2 forgets turn 1 only sometimes, you likely have process-local state.
  4. Inspect LangChain traces

    • Turn on verbose logging.
    • Watch for messages like:
      • Loaded chat history for session_id=...
      • Saving context
      • Missing saves usually mean your callback/history hook is not wired correctly.

A useful quick test:

print("session_id:", session_id)
print("history backend:", type(history).__name__)
print("messages:", len(history.messages))

If messages resets unexpectedly between calls, your backend or keying is wrong.

Prevention

  • Use a shared persistence layer for all conversational state:

    • Redis for short-lived chat history
    • Postgres for durable audit-grade storage
  • Treat session_id as part of your application contract:

    • stable
    • namespaced by tenant/user/conversation
    • never random per request unless that is intentional
  • Avoid module globals and process-local caches for anything user-facing:

    • they pass local tests
    • they fail under horizontal scaling

If you want one rule to keep in mind: LangChain chains are stateless unless you make state explicit. Once you move memory out of process and key it correctly, this error usually disappears fast.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides