How to Fix 'memory not persisting in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
memory-not-persisting-in-productionlangchainpython

If your LangChain memory works locally but resets in production, the issue is usually not “memory” itself. It’s almost always that you’re storing state in a process-local object, then deploying behind multiple workers, serverless invocations, or short-lived containers.

The symptom looks like this: one request sees the chat history, the next request starts from zero. In LangChain Python, that usually means ConversationBufferMemory or another in-memory store is being recreated per request instead of being backed by durable storage.

The Most Common Cause

The #1 cause is using ConversationBufferMemory as if it were persistent storage. It is not. It keeps state in RAM for the lifetime of that Python process, so it disappears on restart and won’t be shared across workers.

Here’s the broken pattern and the fixed pattern side by side:

Broken patternFixed pattern
Memory created inside the request handlerMemory loaded from durable storage by session/user ID
Works in local dev with one processBreaks in production with multiple workers or cold starts
Uses ConversationBufferMemory as persistenceUses external persistence such as Redis/Postgres/vector store + session key
# BROKEN: memory resets every request
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

def chat_endpoint(user_input: str):
    llm = ChatOpenAI(model="gpt-4o-mini")
    memory = ConversationBufferMemory()  # new instance every call
    chain = ConversationChain(llm=llm, memory=memory, verbose=True)
    return chain.predict(input=user_input)
# FIXED: persist by session_id using an external store
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import RedisChatMessageHistory

llm = ChatOpenAI(model="gpt-4o-mini")

def get_history(session_id: str):
    return RedisChatMessageHistory(
        session_id=session_id,
        url="redis://localhost:6379/0",
        key_prefix="chat:"
    )

chain_with_history = RunnableWithMessageHistory(
    llm,
    get_session_history=get_history,
)

def chat_endpoint(user_input: str, session_id: str):
    return chain_with_history.invoke(
        {"input": user_input},
        config={"configurable": {"session_id": session_id}},
    )

If you’re seeing errors like ValueError: Missing keys ['history'] in input or behavior where ConversationBufferMemory returns empty history after deployment, this is where to look first.

Other Possible Causes

1) You’re running multiple workers or replicas

Each worker has its own memory space. Gunicorn with --workers 4, Kubernetes replicas, or Cloud Run instances will not share Python objects.

# This will break in-memory state sharing
gunicorn app:app --workers 4

Fix it by moving conversation state to Redis, Postgres, or another shared backend.

2) Serverless cold starts are wiping your state

If you deploy on Lambda, Cloud Functions, or similar platforms, each invocation may start a new container. Anything stored in ConversationBufferMemory, module globals, or singleton objects can vanish.

# BROKEN on serverless if used as "persistent" state
memory = ConversationBufferMemory()

Use a persistent store keyed by user/session instead:

history = RedisChatMessageHistory(session_id=session_id, url=REDIS_URL)

3) Your session ID changes between requests

This happens when the frontend does not send a stable identifier. If each request gets a new UUID, LangChain correctly creates a new conversation every time.

# BAD: generating a new session id per request
session_id = str(uuid.uuid4())

Use a stable key from auth/session context:

session_id = request.headers["X-Session-Id"]  # stable across turns

4) You are mixing sync and async code incorrectly

Some apps instantiate memory in one path and read it in another async path. That can lead to inconsistent behavior when combined with per-request object creation.

# Example smell: separate objects in sync/async handlers
async def chat_async():
    memory = ConversationBufferMemory()
    ...

Keep the history backend shared and use one access path for both sync and async handlers.

How to Debug It

  1. Print the session ID on every request
    If it changes between turns, you found the bug.

    print("session_id =", session_id)
    
  2. Log the memory backend type
    If you see ConversationBufferMemory, InMemoryChatMessageHistory, or another local-only class in production code, that’s a red flag.

    print(type(memory))
    
  3. Check whether your app runs with more than one process
    Look at Gunicorn workers, Docker replicas, Kubernetes deployments, or serverless logs. If there’s more than one instance, process-local memory will not persist across requests.

  4. Inspect the actual stored messages
    For Redis-backed history:

    history = get_history(session_id)
    print(history.messages)
    

    If this is empty after a previous turn succeeded, your persistence layer is misconfigured or your key changed.

Prevention

  • Use ConversationBufferMemory only for single-process prototyping. For production chat state, back it with Redis, Postgres, or another shared store.
  • Make session_id explicit and stable. Tie it to authenticated user identity or a server-issued conversation ID.
  • Add an integration test that sends two requests to different worker instances and verifies the second response sees prior context.

If you want a simple rule: if the conversation matters after the current Python process exits, it does not belong in LangChain RAM objects. Use durable storage and pass a stable session key every time.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides