How to Fix 'memory not persisting' in LlamaIndex (TypeScript)
What “memory not persisting” actually means
In LlamaIndex TypeScript, this usually means your chat history or agent state exists for one request, then disappears on the next one. The common symptom is that ChatMemoryBuffer looks fine in memory, but after a new HTTP request, chatHistory comes back empty or your agent behaves like it forgot the conversation.
You’ll usually hit this when building:
- •Express or Next.js API routes
- •Serverless functions
- •Multi-turn chat apps where you recreate the LlamaIndex objects on every call
The Most Common Cause
The #1 cause is simple: you are creating a new memory instance on every request.
ChatMemoryBuffer is not durable storage. It holds messages in process memory unless you explicitly persist them somewhere and reload them. If your code instantiates it inside the handler, it gets reset every time.
Broken pattern vs fixed pattern
| Broken | Fixed |
|---|---|
| Memory created inside request handler | Memory loaded from persistent store by session ID |
| State lost on every invocation | State survives across requests |
| Works in local tests, fails in production | Works consistently across restarts |
// ❌ Broken: memory is recreated on every request
import { ChatMemoryBuffer } from "llamaindex";
export async function POST(req: Request) {
const { message } = await req.json();
const memory = new ChatMemoryBuffer({
tokenLimit: 3000,
});
await memory.put({ role: "user", content: message });
const history = await memory.getMessages();
return Response.json({ history });
}
// ✅ Fixed: load and save memory per session
import { ChatMemoryBuffer } from "llamaindex";
const memoryStore = new Map<string, string[]>();
export async function POST(req: Request) {
const { message, sessionId } = await req.json();
const previousMessages = memoryStore.get(sessionId) ?? [];
const memory = new ChatMemoryBuffer({
tokenLimit: 3000,
initialMessages: previousMessages.map((content) => ({
role: "user",
content,
})),
});
await memory.put({ role: "user", content: message });
const messages = await memory.getMessages();
// Persist serialized state somewhere real in production:
memoryStore.set(
sessionId,
messages.map((m) => m.content ?? "")
);
return Response.json({ messages });
}
If you’re using an actual app, don’t use a Map like this as your final solution. Use Redis, Postgres, DynamoDB, or another shared store.
Other Possible Causes
1. You are not passing the same sessionId
If each request gets a new session key, persistence will look broken even if your storage works.
// ❌ Broken
const sessionId = crypto.randomUUID();
// ✅ Fixed
const sessionId = req.headers.get("x-session-id") ?? body.sessionId;
2. You are using serverless or stateless deployment
In AWS Lambda, Vercel functions, or similar environments, in-memory objects can disappear between invocations. A global variable may work locally and fail under real traffic.
// ❌ Broken assumption
let sharedMemory = new ChatMemoryBuffer({ tokenLimit: 3000 });
Use external storage instead:
// ✅ Better pattern
const savedState = await redis.get(`memory:${sessionId}`);
3. You are only persisting the latest user message, not the full conversation
A lot of implementations store just one prompt and forget assistant replies. That breaks multi-turn context.
// ❌ Broken
await db.save(sessionId, { lastUserMessage: message });
// ✅ Fixed
await db.save(sessionId, {
messages: [
...existingMessages,
{ role: "user", content: message },
{ role: "assistant", content: reply },
],
});
4. Your token limit is truncating history too aggressively
ChatMemoryBuffer trims old messages when it hits its token limit. If your limit is too low, it will look like memory is not persisting.
// ❌ Too small for real conversations
const memory = new ChatMemoryBuffer({
tokenLimit: 500,
});
// ✅ More realistic for multi-turn chat
const memory = new ChatMemoryBuffer({
tokenLimit: 4000,
});
If you need long-term retention, use a summary strategy or external persistence instead of relying on raw buffer growth.
How to Debug It
- •
Log the session ID on every request
- •Confirm it stays constant across turns.
- •If it changes, your storage lookup will always miss.
- •
Inspect what you actually save
- •Print the serialized messages before writing to Redis or DB.
- •Make sure both user and assistant messages are being stored.
- •
Check whether the process restarts
- •In dev servers and serverless deployments, restart behavior can wipe in-memory state.
- •If logs show fresh initialization every request, you found the issue.
- •
Verify
ChatMemoryBufferlength before and after each turn- •Call
getMessages()after eachput(). - •If messages exist immediately but vanish later, your persistence layer is wrong.
- •If they never accumulate past a few turns, your token limit is probably too low.
- •Call
Example debug snippet:
const messagesBefore = await memory.getMessages();
console.log("before:", messagesBefore.length);
await memory.put({ role: "user", content: message });
const messagesAfter = await memory.getMessages();
console.log("after:", messagesAfter.length);
Prevention
- •Use a real persistence layer from day one:
- •Redis for short-lived chat state
- •Postgres for durable conversation history
- •Key everything by stable identifiers:
- •
tenantId - •
userId - •
sessionId
- •
- •Treat
ChatMemoryBufferas an in-process cache, not your source of truth
If you’re building anything that needs conversation continuity across requests, assume process memory will fail you eventually. Persist the transcript yourself and hydrate LlamaIndex from that state on each turn.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit