How to Fix 'memory not persisting when scaling' in LangChain (Python)
When memory stops persisting after you scale a LangChain app, it usually means your agent state is living in process-local memory instead of a shared store. It works on one worker, then falls apart as soon as requests land on another pod, container, or gunicorn worker.
In LangChain Python, this usually shows up as ConversationBufferMemory or ConversationSummaryMemory “working locally” but losing history once you add concurrency, multiple replicas, or serverless execution.
The Most Common Cause
The #1 cause is using in-memory state in a horizontally scaled app.
ConversationBufferMemory, InMemoryChatMessageHistory, and similar classes are fine for a single process. They are not shared across workers, so each replica sees a different memory buffer.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Store chat history in process memory | Store chat history in Redis, Postgres, or another shared backend |
| Instantiate memory inside the request handler | Reuse a persistent ChatMessageHistory per user/session |
| Assume one Python process owns the conversation | Assume any request can hit any worker |
# BROKEN: memory lives only inside this process
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI
def handle_request(user_input: str):
memory = ConversationBufferMemory() # resets on every request / worker
llm = ChatOpenAI(model="gpt-4o-mini")
chain = ConversationChain(llm=llm, memory=memory)
return chain.predict(input=user_input)
# FIXED: use shared persistence for message history
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.chat_message_histories import RedisChatMessageHistory
def handle_request(user_id: str, user_input: str):
history = RedisChatMessageHistory(
session_id=user_id,
url="redis://localhost:6379/0",
)
memory = ConversationBufferMemory(
chat_memory=history,
return_messages=True,
memory_key="history",
)
llm = ChatOpenAI(model="gpt-4o-mini")
chain = ConversationChain(llm=llm, memory=memory)
return chain.predict(input=user_input)
If you are using newer LangChain patterns, the same rule applies: keep state outside the worker and load it by session key. The class names change, but the failure mode does not.
Other Possible Causes
1) You are creating a new session ID on every request
If your session_id changes, persistence looks broken even when the backend is correct.
# BAD: new UUID every call means no continuity
import uuid
session_id = str(uuid.uuid4())
history = RedisChatMessageHistory(session_id=session_id, url=REDIS_URL)
Use a stable identifier:
session_id = f"user:{user_id}" # or account_id / conversation_id
history = RedisChatMessageHistory(session_id=session_id, url=REDIS_URL)
2) Your deployment scales replicas without shared storage
This happens in Kubernetes, ECS, Cloud Run, and gunicorn with multiple workers.
# BAD if your app relies on InMemoryChatMessageHistory
replicas: 3
If you must scale horizontally:
- •Use Redis for short-lived chat history
- •Use Postgres if you need auditability and long-term retention
- •Do not rely on Python object state across requests
3) You are mixing sync and async paths incorrectly
A common bug is saving messages in one path and reading from another with different session handling.
# BAD: sync and async code using different session keys / stores
history_sync = RedisChatMessageHistory(session_id=user_id, url=REDIS_URL)
history_async = RedisChatMessageHistory(session_id=f"async:{user_id}", url=REDIS_URL)
Keep one canonical store and one canonical session key strategy.
4) Your chain is not actually wired to memory
This is easy to miss when refactoring to Runnables or custom chains.
# BAD: memory created but never attached to the chain that runs
memory = ConversationBufferMemory()
llm = ChatOpenAI(model="gpt-4o-mini")
chain = llm # no memory integration here
Make sure the runnable/chain that executes has access to the message history through the expected interface. If you moved from ConversationChain to LCEL, verify you are explicitly loading and saving messages.
How to Debug It
- •
Check whether the failure only appears after scaling
- •Run one worker locally.
- •Then run two gunicorn workers or two pods.
- •If history disappears only after scaling, you have a process-local state problem.
- •
Log the session ID on every request
- •Print or trace
user_id,conversation_id, orsession_id. - •If it changes between turns, persistence will never work.
- •Print or trace
- •
Inspect where messages are stored
- •If you use
InMemoryChatMessageHistory, it will reset with process restarts. - •If you use Redis/Postgres but still lose data, verify writes are happening before reads.
- •If you use
- •
Confirm your chain is loading the same history object
- •In older APIs, check
memory.chat_memory. - •In newer APIs, check that your runnable reads from the same backend keyed by the same session ID.
- •In older APIs, check
A quick sanity check:
history.add_user_message("hello")
print(history.messages)
# restart worker / hit another replica:
print(history.messages) # if empty here, storage is not shared or session key changed
Prevention
- •
Use shared persistence from day one:
- •Redis for ephemeral conversational state
- •Postgres for durable conversation records
- •
Treat
session_idas part of your API contract:- •Stable across retries
- •Stable across workers
- •Stable across deploys
- •
Add an integration test that simulates scale:
- •First request writes a message
- •Second request hits a different worker/process
- •Assert the previous message is still present
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit