How to Fix 'state not updating in production' in LlamaIndex (Python)
If you’re seeing state not updating in production in a LlamaIndex app, it usually means your agent, workflow, or chat state is being mutated in memory, but that mutation is not surviving the actual runtime boundary. In practice, this shows up when code works locally and then fails behind FastAPI, Celery, Docker, serverless, or multiple worker processes.
In LlamaIndex Python apps, the root issue is usually one of these: you’re recreating the object per request, storing state on a local variable instead of durable storage, or relying on process memory in an environment that does not guarantee one process per user.
The Most Common Cause
The #1 cause is storing state inside a Python object that gets recreated on every request.
This happens a lot with ChatEngine, ReActAgent, Workflow, or any custom wrapper around Memory. Locally, the object stays alive long enough to look correct. In production, your web server may spin up new workers or rebuild dependencies per request, so the “updated” state disappears.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| State lives in request-scoped object | State lives in persistent store keyed by session/user |
| New agent created on every call | Reuse agent and inject durable memory |
| Works in single-process dev | Breaks with Gunicorn/Uvicorn workers |
# BROKEN: state resets every request
from fastapi import FastAPI
from llama_index.core.chat_engine import SimpleChatEngine
from llama_index.core.memory import ChatMemoryBuffer
app = FastAPI()
@app.post("/chat")
def chat(payload: dict):
memory = ChatMemoryBuffer.from_defaults(token_limit=4000)
chat_engine = SimpleChatEngine.from_defaults(memory=memory)
response = chat_engine.chat(payload["message"])
return {"answer": str(response), "memory_size": len(memory.get_all())}
# FIXED: persist memory by session_id
from fastapi import FastAPI
from llama_index.core.chat_engine import SimpleChatEngine
from llama_index.core.memory import ChatMemoryBuffer
app = FastAPI()
session_memory_store = {}
def get_memory(session_id: str):
if session_id not in session_memory_store:
session_memory_store[session_id] = ChatMemoryBuffer.from_defaults(token_limit=4000)
return session_memory_store[session_id]
@app.post("/chat")
def chat(payload: dict):
session_id = payload["session_id"]
memory = get_memory(session_id)
chat_engine = SimpleChatEngine.from_defaults(memory=memory)
response = chat_engine.chat(payload["message"])
return {"answer": str(response), "memory_size": len(memory.get_all())}
That fix is still only process-local. If you run multiple workers or containers, replace session_memory_store with Redis, Postgres, DynamoDB, or another shared backend.
A real symptom here is seeing logs like:
- •
ValueError: No existing state found for workflow_id=... - •
KeyError: session_id - •
RuntimeError: Workflow state was not persisted - •“it works locally but every message is treated as the first message”
Other Possible Causes
1) Multiple workers are splitting your state
If you run Gunicorn/Uvicorn with more than one worker, each worker has its own memory space.
gunicorn app:app -k uvicorn.workers.UvicornWorker --workers 4
If worker A stores the state and worker B handles the next request, your app looks broken. Use shared storage for anything user-specific.
2) You are using async code but mutating shared objects unsafely
A shared Workflow or Memory instance can be corrupted if concurrent requests mutate it at the same time.
# risky
shared_memory.put(ChatMessage(role="user", content="hello"))
Use per-session locking or isolate state updates behind a persistence layer. If you must keep an in-memory cache, guard writes with an async lock.
import asyncio
lock = asyncio.Lock()
async with lock:
shared_memory.put(message)
3) Your workflow state is never saved after mutation
With LlamaIndex workflows, you can mutate step-local state and assume it persists automatically. It does not unless you explicitly persist it.
# example shape of the bug
ctx.state["approved"] = True
# later request sees empty/default state again
Persist after mutation:
ctx.state["approved"] = True
await workflow_store.save(ctx.run_id, ctx.state)
If you use Context or workflow checkpoints, verify that checkpointing is actually enabled and writing to a durable backend.
4) Your container restarts between requests
In Kubernetes, ECS, Cloud Run, or serverless deployments, ephemeral instances lose all memory on restart.
# bad assumption: local RAM survives deployment lifecycle
env:
- name: WORKER_COUNT
value: "1"
Even one worker doesn’t help if the pod restarts. Store conversation state externally and treat process memory as disposable cache only.
How to Debug It
- •
Print the process ID and worker identity
- •Log
os.getpid()and any worker metadata. - •If consecutive requests hit different PIDs, your “state” is trapped in separate processes.
- •Log
- •
Log the session key before and after mutation
- •Confirm the same
session_id,user_id, orconversation_idarrives on every request. - •If it changes or is missing, your lookup key is wrong.
- •Confirm the same
- •
Inspect where LlamaIndex is storing memory
- •Check whether you’re using
ChatMemoryBuffer, workflow context state, or a custom store. - •If it’s just a Python dict or object field inside a request handler, that’s your bug.
- •Check whether you’re using
- •
Force single-worker mode locally
- •Run one worker and compare behavior.
- •If it works with one worker but fails with two or more, you have a shared-state problem.
Prevention
- •Keep all user/session state outside request-scoped objects.
- •Use Redis/Postgres/etc. for conversation memory and workflow checkpoints.
- •Treat LlamaIndex objects like
SimpleChatEngine,ReActAgent, and workflow contexts as runtime wrappers, not durable storage. - •Add tests that simulate two requests hitting different processes or containers.
- •Make the session key explicit in every API call; don’t infer it from globals.
If you want a simple rule: if losing the Python process loses your user’s conversation history, your design is wrong for production.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit