How to Fix 'memory not persisting when scaling' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21

memory-not-persisting-when-scalinglangchainpython

When memory stops persisting after you scale a LangChain app, it usually means your agent state is living in process-local memory instead of a shared store. It works on one worker, then falls apart as soon as requests land on another pod, container, or gunicorn worker.

In LangChain Python, this usually shows up as ConversationBufferMemory or ConversationSummaryMemory “working locally” but losing history once you add concurrency, multiple replicas, or serverless execution.

The Most Common Cause

The #1 cause is using in-memory state in a horizontally scaled app.

ConversationBufferMemory, InMemoryChatMessageHistory, and similar classes are fine for a single process. They are not shared across workers, so each replica sees a different memory buffer.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Store chat history in process memory	Store chat history in Redis, Postgres, or another shared backend
Instantiate memory inside the request handler	Reuse a persistent `ChatMessageHistory` per user/session
Assume one Python process owns the conversation	Assume any request can hit any worker

# BROKEN: memory lives only inside this process
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

def handle_request(user_input: str):
    memory = ConversationBufferMemory()  # resets on every request / worker
    llm = ChatOpenAI(model="gpt-4o-mini")
    chain = ConversationChain(llm=llm, memory=memory)
    return chain.predict(input=user_input)

# FIXED: use shared persistence for message history
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.chat_message_histories import RedisChatMessageHistory

def handle_request(user_id: str, user_input: str):
    history = RedisChatMessageHistory(
        session_id=user_id,
        url="redis://localhost:6379/0",
    )

    memory = ConversationBufferMemory(
        chat_memory=history,
        return_messages=True,
        memory_key="history",
    )

    llm = ChatOpenAI(model="gpt-4o-mini")
    chain = ConversationChain(llm=llm, memory=memory)
    return chain.predict(input=user_input)

If you are using newer LangChain patterns, the same rule applies: keep state outside the worker and load it by session key. The class names change, but the failure mode does not.

Other Possible Causes

1) You are creating a new session ID on every request

If your session_id changes, persistence looks broken even when the backend is correct.

# BAD: new UUID every call means no continuity
import uuid

session_id = str(uuid.uuid4())
history = RedisChatMessageHistory(session_id=session_id, url=REDIS_URL)

Use a stable identifier:

session_id = f"user:{user_id}"  # or account_id / conversation_id
history = RedisChatMessageHistory(session_id=session_id, url=REDIS_URL)

2) Your deployment scales replicas without shared storage

This happens in Kubernetes, ECS, Cloud Run, and gunicorn with multiple workers.

# BAD if your app relies on InMemoryChatMessageHistory
replicas: 3

If you must scale horizontally:

•Use Redis for short-lived chat history
•Use Postgres if you need auditability and long-term retention
•Do not rely on Python object state across requests

3) You are mixing sync and async paths incorrectly

A common bug is saving messages in one path and reading from another with different session handling.

# BAD: sync and async code using different session keys / stores
history_sync = RedisChatMessageHistory(session_id=user_id, url=REDIS_URL)
history_async = RedisChatMessageHistory(session_id=f"async:{user_id}", url=REDIS_URL)

Keep one canonical store and one canonical session key strategy.

4) Your chain is not actually wired to memory

This is easy to miss when refactoring to Runnables or custom chains.

# BAD: memory created but never attached to the chain that runs
memory = ConversationBufferMemory()
llm = ChatOpenAI(model="gpt-4o-mini")
chain = llm  # no memory integration here

Make sure the runnable/chain that executes has access to the message history through the expected interface. If you moved from ConversationChain to LCEL, verify you are explicitly loading and saving messages.

How to Debug It

•
Check whether the failure only appears after scaling
- •Run one worker locally.
- •Then run two gunicorn workers or two pods.
- •If history disappears only after scaling, you have a process-local state problem.
•
Log the session ID on every request
- •Print or trace user_id, conversation_id, or session_id.
- •If it changes between turns, persistence will never work.
•
Inspect where messages are stored
- •If you use InMemoryChatMessageHistory, it will reset with process restarts.
- •If you use Redis/Postgres but still lose data, verify writes are happening before reads.
•
Confirm your chain is loading the same history object
- •In older APIs, check memory.chat_memory.
- •In newer APIs, check that your runnable reads from the same backend keyed by the same session ID.

A quick sanity check:

history.add_user_message("hello")
print(history.messages)

# restart worker / hit another replica:
print(history.messages)  # if empty here, storage is not shared or session key changed

Prevention

•
Use shared persistence from day one:
- •Redis for ephemeral conversational state
- •Postgres for durable conversation records
•
Treat session_id as part of your API contract:
- •Stable across retries
- •Stable across workers
- •Stable across deploys
•
Add an integration test that simulates scale:
- •First request writes a message
- •Second request hits a different worker/process
- •Assert the previous message is still present

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit