How to Fix 'state not updating in production' in LlamaIndex (TypeScript)
If your LlamaIndex TypeScript app works locally but the state not updating in production issue shows up after deployment, you’re usually dealing with a state lifecycle bug, not a model bug. In practice, this happens when your agent, memory, or workflow state is being recreated per request, cached incorrectly, or mutated in a way that never survives the serverless/runtime boundary.
The key clue is that the LlamaIndex classes still run, but your stateful objects behave like they’re reset between calls. You’ll often see symptoms like WorkflowContext losing values, ChatMemoryBuffer coming back empty, or an agent that “forgets” prior tool results after the first request.
The Most Common Cause
The #1 cause is creating stateful LlamaIndex objects inside a request handler instead of keeping them stable across the session or persisting them explicitly.
This bites hard in production because local dev often runs a single long-lived process, while production may use serverless functions, edge runtimes, multiple instances, or hot reload boundaries.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
Recreates OpenAI, VectorStoreIndex, ChatMemoryBuffer, or Workflow on every request | Initializes once and reuses the instance, or persists state externally |
| Stores conversation state in local variables | Stores state in Redis/DB/session store |
| Assumes process memory survives between requests | Treats process memory as ephemeral |
// ❌ Broken: state gets recreated on every request
import { NextRequest, NextResponse } from "next/server";
import { OpenAI } from "llamaindex";
import { ChatMemoryBuffer } from "llamaindex";
export async function POST(req: NextRequest) {
const { message } = await req.json();
const llm = new OpenAI({ model: "gpt-4o-mini" });
const memory = ChatMemoryBuffer.fromDefaults(); // resets every call
await memory.put({ role: "user", content: message });
const response = await llm.complete({
prompt: `Conversation so far:\n${await memory.getAll()}\nReply to user.`,
});
return NextResponse.json({ response: response.text });
}
// ✅ Fixed: persist memory outside the request lifecycle
import { NextRequest, NextResponse } from "next/server";
import { OpenAI } from "llamaindex";
import { ChatMemoryBuffer } from "llamaindex";
const llm = new OpenAI({ model: "gpt-4o-mini" });
// Replace this with Redis/Postgres/session storage in production
const memoryBySession = new Map<string, ChatMemoryBuffer>();
function getMemory(sessionId: string) {
let memory = memoryBySession.get(sessionId);
if (!memory) {
memory = ChatMemoryBuffer.fromDefaults();
memoryBySession.set(sessionId, memory);
}
return memory;
}
export async function POST(req: NextRequest) {
const { message, sessionId } = await req.json();
const memory = getMemory(sessionId);
await memory.put({ role: "user", content: message });
const response = await llm.complete({
prompt: `Conversation so far:\n${await memory.getAll()}\nReply to user.`,
});
return NextResponse.json({ response: response.text });
}
If you’re using Workflow or AgentWorkflow, the same rule applies. Don’t create a fresh workflow object for each turn if you expect it to retain context.
Other Possible Causes
1. Serverless cold starts and multi-instance routing
If you deploy to Lambda, Vercel Functions, Cloud Run autoscaling, or any environment with multiple instances, in-memory state is not guaranteed to survive.
// Bad assumption
let currentState = {};
export async function handler(req: Request) {
currentState.lastMessage = "hello"; // may vanish on next invocation
}
Use Redis, DynamoDB, Postgres JSONB, or another external store for anything that must survive requests.
2. Mutating nested state without reassigning it
Some app frameworks only detect updates when the reference changes. If you mutate an object in place and then serialize it later, your UI or workflow layer may not notice.
// ❌ In-place mutation
state.messages.push(newMessage);
// ✅ Reassign a new object/array
state = {
...state,
messages: [...state.messages, newMessage],
};
This matters if your app wraps LlamaIndex inside React server actions, Zustand stores, or custom event-driven orchestration.
3. Mismatched async flow
If you forget to await a write before reading state back, production latency makes the race condition visible.
// ❌ Race condition
memory.put({ role: "user", content: message });
const history = await memory.getAll();
// ✅ Correct
await memory.put({ role: "user", content: message });
const history = await memory.getAll();
This can show up as empty chat history or missing tool outputs right after a write.
4. Version mismatch between llamaindex packages
A common production-only issue is running different versions of core packages locally vs deployed. That can produce odd behavior around workflows and storage adapters.
Check these together:
{
"dependencies": {
"llamaindex": "^0.4.0"
}
}
Then verify lockfile consistency and deployment install mode. If your CI installs with one version and production resolves another, fix that first.
How to Debug It
- •
Log object identity and lifecycle
- •Add logs where you create
ChatMemoryBuffer,Workflow,AgentWorkflow, or index instances. - •If those logs appear on every request, you found the problem.
- •Add logs where you create
- •
Check whether state lives in process memory
- •Search for plain variables like
let sessionState = {}. - •If the value disappears after redeploys or across concurrent requests, move it to durable storage.
- •Search for plain variables like
- •
Verify async writes are awaited
- •Look for missing
awaiton.put(),.update(),.persist(), or database writes. - •A lot of “state not updating” bugs are just read-after-write races.
- •Look for missing
- •
Reproduce under production-like conditions
- •Run with multiple workers or restart between requests.
- •If it breaks only when the process restarts or scales horizontally, your state is not persisted correctly.
Prevention
- •Keep LlamaIndex runtime objects stateless unless they are intentionally scoped to a session.
- •Persist conversation/workflow state in Redis, Postgres, DynamoDB, or another external store.
- •Treat serverless and autoscaled deployments as ephemeral by default.
- •Pin package versions and lockfile behavior in CI/CD so local and prod resolve the same LlamaIndex build.
If you want one rule to remember: don’t trust process memory for agent state. In TypeScript production apps with LlamaIndex, that’s the fastest path to “works locally, breaks in prod.”
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit