How to Fix 'memory not persisting during development' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
memory-not-persisting-during-developmentllamaindextypescript

When people say “memory not persisting during development” in LlamaIndex TypeScript, they usually mean this: your chat history works for one request, then resets on the next reload, hot restart, or API call. In practice, the app is creating a fresh in-memory store every time, so ChatMemoryBuffer or your persistence layer never gets a chance to survive process restarts.

This shows up most often in local dev with Next.js, Express hot reload, serverless handlers, or any setup where the module is reloaded between requests.

The Most Common Cause

The #1 cause is instantiating memory inside a request handler or component instead of reusing a persistent store. In LlamaIndex TypeScript, ChatMemoryBuffer is not magic persistence by itself. If you create it per request, you get a new empty buffer every time.

Here’s the broken pattern:

BrokenFixed
Memory created inside handlerMemory created once and reused
Resets on every reload/requestPersists via storage context
// ❌ Broken: memory recreated on every request
import { ChatMemoryBuffer } from "llamaindex";

export async function POST(req: Request) {
  const memory = new ChatMemoryBuffer({
    tokenLimit: 4000,
  });

  const body = await req.json();
  const userMessage = body.message;

  memory.put({ role: "user", content: userMessage });

  return Response.json({
    message: "Stored",
    memorySize: memory.getMessages().length,
  });
}
// ✅ Fixed: reuse storage-backed memory across requests
import { ChatMemoryBuffer, SimpleChatStore } from "llamaindex";

const chatStore = new SimpleChatStore();

// create once at module scope
const memory = new ChatMemoryBuffer({
  tokenLimit: 4000,
  chatStore,
  chatStoreKey: "dev-session-1",
});

export async function POST(req: Request) {
  const body = await req.json();
  const userMessage = body.message;

  memory.put({ role: "user", content: userMessage });

  return Response.json({
    message: "Stored",
    memorySize: memory.getMessages().length,
  });
}

If you are using ContextChatEngine, the same rule applies. Don’t rebuild the engine and its memory on every call unless you also restore state from persistent storage.

Other Possible Causes

1. You are using ephemeral storage only

If your setup uses SimpleChatStore without saving it to disk or a database, it will disappear when the process exits.

import { SimpleChatStore } from "llamaindex";

const chatStore = new SimpleChatStore();
// This exists only in RAM unless you persist it yourself.

Fix it by loading and saving state explicitly:

import fs from "node:fs";
import { SimpleChatStore } from "llamaindex";

const path = "./chat-store.json";
const chatStore = new SimpleChatStore();

if (fs.existsSync(path)) {
  chatStore.fromDict(JSON.parse(fs.readFileSync(path, "utf-8")));
}

// later after updates
fs.writeFileSync(path, JSON.stringify(chatStore.toDict(), null, 2));

2. Hot reload is recreating module state

In Next.js dev mode or with nodemon/tsx watch, module-level variables can still reset when files change. You’ll see behavior like:

  • first message persists
  • save file changes
  • second message starts from empty history

Use an external store if you need state across reloads:

// better than relying on module globals during dev
const sessionId = req.headers.get("x-session-id") ?? "default";
const key = `chat:${sessionId}`;

Then map that key to Redis, SQLite, Postgres, or file-backed JSON.

3. Wrong session key

A very common bug is generating a new key per request.

// ❌ broken
const key = crypto.randomUUID();

That guarantees every request gets a fresh conversation. Use a stable identifier:

// ✅ fixed
const key = req.headers.get("x-session-id") ?? userId;

If the key changes, LlamaIndex will behave like there is no prior history.

4. You are not actually reading from memory before answering

Sometimes the messages are stored correctly, but your query engine never uses them. For example, you append messages to ChatMemoryBuffer, then call the LLM directly instead of a chat engine that includes history.

// ❌ broken flow
memory.put({ role: "user", content: input });
const response = await llm.complete(input); // ignores memory entirely

Use ContextChatEngine or pass retrieved chat history into the prompt:

import { ContextChatEngine } from "llamaindex";

const chatEngine = new ContextChatEngine({
  retriever,
  chatModel,
  memory,
});

How to Debug It

  1. Print the session key

    • Log the exact key used for storage.
    • If it changes between requests, that’s your bug.
  2. Inspect stored messages before and after each call

    • Check memory.getMessages().length.
    • If it always starts at 0, your store is being recreated.
  3. Verify persistence layer writes

    • If using file/DB/Redis storage, confirm the write happens.
    • Add logs around save/load calls and inspect the actual backing store.
  4. Disable hot reload temporarily

    • Run the app in production mode locally.
    • If persistence suddenly works, your dev server reload cycle is resetting state.

A good debug log looks like this:

console.log("sessionKey =", key);
console.log("before =", memory.getMessages().length);

memory.put({ role: "user", content: userMessage });

console.log("after =", memory.getMessages().length);

If you see before = 0 every time for the same session key, you are not reusing the same backing store.

Prevention

  • Create memory and storage at the right scope:

    • per session/user for conversations
    • not per request unless you restore state manually
  • Use stable identifiers:

    • userId
    • conversationId
    • signed session cookie value
  • Back chat state with something durable:

    • Redis for short-lived sessions
    • Postgres or SQLite for durable dev/test data
    • file-backed JSON only for local experiments

The practical fix is simple: stop treating ChatMemoryBuffer like persistence and start treating it like an interface over a real store. Once you give it a stable session key and durable backing storage, “memory not persisting during development” stops being mysterious and becomes just another state-management bug.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides