How to Fix 'memory not persisting' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
memory-not-persistingllamaindextypescript

What “memory not persisting” actually means

In LlamaIndex TypeScript, this usually means your chat history or agent state exists for one request, then disappears on the next one. The common symptom is that ChatMemoryBuffer looks fine in memory, but after a new HTTP request, chatHistory comes back empty or your agent behaves like it forgot the conversation.

You’ll usually hit this when building:

  • Express or Next.js API routes
  • Serverless functions
  • Multi-turn chat apps where you recreate the LlamaIndex objects on every call

The Most Common Cause

The #1 cause is simple: you are creating a new memory instance on every request.

ChatMemoryBuffer is not durable storage. It holds messages in process memory unless you explicitly persist them somewhere and reload them. If your code instantiates it inside the handler, it gets reset every time.

Broken pattern vs fixed pattern

BrokenFixed
Memory created inside request handlerMemory loaded from persistent store by session ID
State lost on every invocationState survives across requests
Works in local tests, fails in productionWorks consistently across restarts
// ❌ Broken: memory is recreated on every request
import { ChatMemoryBuffer } from "llamaindex";

export async function POST(req: Request) {
  const { message } = await req.json();

  const memory = new ChatMemoryBuffer({
    tokenLimit: 3000,
  });

  await memory.put({ role: "user", content: message });

  const history = await memory.getMessages();
  return Response.json({ history });
}
// ✅ Fixed: load and save memory per session
import { ChatMemoryBuffer } from "llamaindex";

const memoryStore = new Map<string, string[]>();

export async function POST(req: Request) {
  const { message, sessionId } = await req.json();

  const previousMessages = memoryStore.get(sessionId) ?? [];

  const memory = new ChatMemoryBuffer({
    tokenLimit: 3000,
    initialMessages: previousMessages.map((content) => ({
      role: "user",
      content,
    })),
  });

  await memory.put({ role: "user", content: message });

  const messages = await memory.getMessages();

  // Persist serialized state somewhere real in production:
  memoryStore.set(
    sessionId,
    messages.map((m) => m.content ?? "")
  );

  return Response.json({ messages });
}

If you’re using an actual app, don’t use a Map like this as your final solution. Use Redis, Postgres, DynamoDB, or another shared store.

Other Possible Causes

1. You are not passing the same sessionId

If each request gets a new session key, persistence will look broken even if your storage works.

// ❌ Broken
const sessionId = crypto.randomUUID();
// ✅ Fixed
const sessionId = req.headers.get("x-session-id") ?? body.sessionId;

2. You are using serverless or stateless deployment

In AWS Lambda, Vercel functions, or similar environments, in-memory objects can disappear between invocations. A global variable may work locally and fail under real traffic.

// ❌ Broken assumption
let sharedMemory = new ChatMemoryBuffer({ tokenLimit: 3000 });

Use external storage instead:

// ✅ Better pattern
const savedState = await redis.get(`memory:${sessionId}`);

3. You are only persisting the latest user message, not the full conversation

A lot of implementations store just one prompt and forget assistant replies. That breaks multi-turn context.

// ❌ Broken
await db.save(sessionId, { lastUserMessage: message });
// ✅ Fixed
await db.save(sessionId, {
  messages: [
    ...existingMessages,
    { role: "user", content: message },
    { role: "assistant", content: reply },
  ],
});

4. Your token limit is truncating history too aggressively

ChatMemoryBuffer trims old messages when it hits its token limit. If your limit is too low, it will look like memory is not persisting.

// ❌ Too small for real conversations
const memory = new ChatMemoryBuffer({
  tokenLimit: 500,
});
// ✅ More realistic for multi-turn chat
const memory = new ChatMemoryBuffer({
  tokenLimit: 4000,
});

If you need long-term retention, use a summary strategy or external persistence instead of relying on raw buffer growth.

How to Debug It

  1. Log the session ID on every request

    • Confirm it stays constant across turns.
    • If it changes, your storage lookup will always miss.
  2. Inspect what you actually save

    • Print the serialized messages before writing to Redis or DB.
    • Make sure both user and assistant messages are being stored.
  3. Check whether the process restarts

    • In dev servers and serverless deployments, restart behavior can wipe in-memory state.
    • If logs show fresh initialization every request, you found the issue.
  4. Verify ChatMemoryBuffer length before and after each turn

    • Call getMessages() after each put().
    • If messages exist immediately but vanish later, your persistence layer is wrong.
    • If they never accumulate past a few turns, your token limit is probably too low.

Example debug snippet:

const messagesBefore = await memory.getMessages();
console.log("before:", messagesBefore.length);

await memory.put({ role: "user", content: message });

const messagesAfter = await memory.getMessages();
console.log("after:", messagesAfter.length);

Prevention

  • Use a real persistence layer from day one:
    • Redis for short-lived chat state
    • Postgres for durable conversation history
  • Key everything by stable identifiers:
    • tenantId
    • userId
    • sessionId
  • Treat ChatMemoryBuffer as an in-process cache, not your source of truth

If you’re building anything that needs conversation continuity across requests, assume process memory will fail you eventually. Persist the transcript yourself and hydrate LlamaIndex from that state on each turn.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides