How to Fix 'memory not persisting when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

memory-not-persisting-when-scalingllamaindextypescript

If your LlamaIndex TypeScript app works on one request but loses chat history or state as soon as you scale to multiple workers, pods, or serverless invocations, the issue is usually not “memory” in the abstract. It means the memory object is living in process-local RAM, so each instance sees a different copy.

You’ll usually notice this after moving from local dev to Docker, PM2 cluster mode, Vercel/Netlify functions, or Kubernetes replicas. The common symptom is: ChatMemoryBuffer looks fine in logs, but subsequent turns come back empty or inconsistent.

The Most Common Cause

The #1 cause is creating memory inside the request handler or inside each agent run. That gives every request a fresh ChatMemoryBuffer, so nothing persists across instances.

In LlamaIndex TypeScript, this often shows up with classes like ChatMemoryBuffer, OpenAIAgent, or ReActAgent where the memory object is instantiated too late.

Broken pattern	Fixed pattern
Memory created per request	Memory created once and reused
Works locally in one process	Fails when scaled horizontally
State stored in RAM only	State backed by shared storage

// ❌ Broken: new memory every request
import { ChatMemoryBuffer } from "llamaindex";

export async function POST(req: Request) {
  const memory = new ChatMemoryBuffer({
    tokenLimit: 2000,
  });

  const body = await req.json();
  await memory.put({ role: "user", content: body.message });

  const chatHistory = await memory.getMessages();
  return Response.json({ chatHistory });
}

// ✅ Fixed: reuse a shared memory instance or persist it externally
import { ChatMemoryBuffer } from "llamaindex";

const memory = new ChatMemoryBuffer({
  tokenLimit: 2000,
});

export async function POST(req: Request) {
  const body = await req.json();
  await memory.put({ role: "user", content: body.message });

  const chatHistory = await memory.getMessages();
  return Response.json({ chatHistory });
}

That fixed version only helps if you have a single long-lived Node process. If you are scaling across multiple instances, you still need external persistence. In that case, keep the ChatMemoryBuffer as a working cache and store messages in Redis, Postgres, or another shared backend.

A more realistic production pattern is:

import { ChatMemoryBuffer } from "llamaindex";

const memoryBySession = new Map<string, ChatMemoryBuffer>();

function getMemory(sessionId: string) {
  let memory = memoryBySession.get(sessionId);
  if (!memory) {
    memory = new ChatMemoryBuffer({ tokenLimit: 2000 });
    memoryBySession.set(sessionId, memory);
  }
  return memory;
}

That still breaks across pods, but it fixes the “I accidentally recreated it every request” bug.

Other Possible Causes

1. You are using serverless functions without external storage

Serverless runtimes do not guarantee process reuse. If your code depends on module-level state, it may appear to work during warm invocations and fail on cold starts.

// ❌ Not durable in serverless
const memory = new ChatMemoryBuffer({ tokenLimit: 2000 });

Use Redis or a database keyed by user/session ID instead.

// ✅ Persist messages externally
await redis.lpush(`chat:${sessionId}`, JSON.stringify(message));

2. You are not passing a stable session key

If every request gets a new sessionId, you are effectively asking for a new conversation each time.

// ❌ New ID every time
const sessionId = crypto.randomUUID();

Use an authenticated user ID or a real conversation ID from your app.

// ✅ Stable per conversation/user
const sessionId = req.headers.get("x-session-id");

3. You are mixing workers without shared state

If you run PM2 cluster mode or multiple Kubernetes replicas, each worker has its own heap. A module-level singleton will not be shared.

# ❌ Each worker has separate memory
pm2 start app.js -i max

Fix it by moving state out of process memory:

•Redis for low-latency session state
•Postgres for durable audit/history
•DynamoDB if you already live in AWS

4. Your agent is recreated with fresh tools and fresh memory

Sometimes the bug is not the buffer itself; it’s that the whole agent gets rebuilt every turn.

// ❌ Agent rebuilt every request
export async function handleTurn(input: string) {
  const agent = new OpenAIAgent({
    tools,
    memory: new ChatMemoryBuffer({ tokenLimit: 2000 }),
  });

  return agent.chat(input);
}

Keep the agent configuration stable and inject session-specific state separately.

// ✅ Stable agent + session-backed state
const agent = new OpenAIAgent({ tools });

export async function handleTurn(sessionId: string, input: string) {
  const memory = getMemory(sessionId);
  return agent.chat(input, { chatHistory: await memory.getMessages() });
}

How to Debug It

•
Log the session key and process identity
- •Print sessionId, process.pid, and pod name if available.
- •If the same user hits different PIDs/pods and history disappears, you have a distributed state problem.
•
Log when memory is constructed
- •Add a log near new ChatMemoryBuffer(...).
- •If it fires on every request, you found the bug.
•
Inspect whether messages are actually being stored
- •
  Check after each turn:
```
const messages = await memory.getMessages();
console.log(messages.length);
```
- •If length stays at 0 or resets unexpectedly, your storage path is wrong.
•
Test with two concurrent requests
- •Send two turns for the same conversation through different instances.
- •If one instance sees history and the other does not, move persistence outside process RAM.

Prevention

•Treat ChatMemoryBuffer as an in-process cache unless you have proven otherwise.
•Key all conversation state by a stable sessionId or conversationId.
•Use Redis/Postgres for any deployment with more than one worker, pod, or serverless invocation.
•Add an integration test that sends multiple turns through separate requests and asserts history survives.

If you want one rule to remember: do not store conversational state only inside Node heap if your app can scale horizontally. That works on localhost and fails everywhere else.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit