How to Fix 'memory not persisting in production' in LangChain (Python)
If your LangChain memory works locally but resets in production, the issue is usually not “memory” itself. It’s almost always that you’re storing state in a process-local object, then deploying behind multiple workers, serverless invocations, or short-lived containers.
The symptom looks like this: one request sees the chat history, the next request starts from zero. In LangChain Python, that usually means ConversationBufferMemory or another in-memory store is being recreated per request instead of being backed by durable storage.
The Most Common Cause
The #1 cause is using ConversationBufferMemory as if it were persistent storage. It is not. It keeps state in RAM for the lifetime of that Python process, so it disappears on restart and won’t be shared across workers.
Here’s the broken pattern and the fixed pattern side by side:
| Broken pattern | Fixed pattern |
|---|---|
| Memory created inside the request handler | Memory loaded from durable storage by session/user ID |
| Works in local dev with one process | Breaks in production with multiple workers or cold starts |
Uses ConversationBufferMemory as persistence | Uses external persistence such as Redis/Postgres/vector store + session key |
# BROKEN: memory resets every request
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
def chat_endpoint(user_input: str):
llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory() # new instance every call
chain = ConversationChain(llm=llm, memory=memory, verbose=True)
return chain.predict(input=user_input)
# FIXED: persist by session_id using an external store
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import RedisChatMessageHistory
llm = ChatOpenAI(model="gpt-4o-mini")
def get_history(session_id: str):
return RedisChatMessageHistory(
session_id=session_id,
url="redis://localhost:6379/0",
key_prefix="chat:"
)
chain_with_history = RunnableWithMessageHistory(
llm,
get_session_history=get_history,
)
def chat_endpoint(user_input: str, session_id: str):
return chain_with_history.invoke(
{"input": user_input},
config={"configurable": {"session_id": session_id}},
)
If you’re seeing errors like ValueError: Missing keys ['history'] in input or behavior where ConversationBufferMemory returns empty history after deployment, this is where to look first.
Other Possible Causes
1) You’re running multiple workers or replicas
Each worker has its own memory space. Gunicorn with --workers 4, Kubernetes replicas, or Cloud Run instances will not share Python objects.
# This will break in-memory state sharing
gunicorn app:app --workers 4
Fix it by moving conversation state to Redis, Postgres, or another shared backend.
2) Serverless cold starts are wiping your state
If you deploy on Lambda, Cloud Functions, or similar platforms, each invocation may start a new container. Anything stored in ConversationBufferMemory, module globals, or singleton objects can vanish.
# BROKEN on serverless if used as "persistent" state
memory = ConversationBufferMemory()
Use a persistent store keyed by user/session instead:
history = RedisChatMessageHistory(session_id=session_id, url=REDIS_URL)
3) Your session ID changes between requests
This happens when the frontend does not send a stable identifier. If each request gets a new UUID, LangChain correctly creates a new conversation every time.
# BAD: generating a new session id per request
session_id = str(uuid.uuid4())
Use a stable key from auth/session context:
session_id = request.headers["X-Session-Id"] # stable across turns
4) You are mixing sync and async code incorrectly
Some apps instantiate memory in one path and read it in another async path. That can lead to inconsistent behavior when combined with per-request object creation.
# Example smell: separate objects in sync/async handlers
async def chat_async():
memory = ConversationBufferMemory()
...
Keep the history backend shared and use one access path for both sync and async handlers.
How to Debug It
- •
Print the session ID on every request
If it changes between turns, you found the bug.print("session_id =", session_id) - •
Log the memory backend type
If you seeConversationBufferMemory,InMemoryChatMessageHistory, or another local-only class in production code, that’s a red flag.print(type(memory)) - •
Check whether your app runs with more than one process
Look at Gunicorn workers, Docker replicas, Kubernetes deployments, or serverless logs. If there’s more than one instance, process-local memory will not persist across requests. - •
Inspect the actual stored messages
For Redis-backed history:history = get_history(session_id) print(history.messages)If this is empty after a previous turn succeeded, your persistence layer is misconfigured or your key changed.
Prevention
- •Use
ConversationBufferMemoryonly for single-process prototyping. For production chat state, back it with Redis, Postgres, or another shared store. - •Make
session_idexplicit and stable. Tie it to authenticated user identity or a server-issued conversation ID. - •Add an integration test that sends two requests to different worker instances and verifies the second response sees prior context.
If you want a simple rule: if the conversation matters after the current Python process exits, it does not belong in LangChain RAM objects. Use durable storage and pass a stable session key every time.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit