How to Fix 'memory not persisting in production' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

memory-not-persisting-in-productionautogentypescript

If your AutoGen TypeScript agent “works locally” but memory disappears in production, the problem is usually not the model. It’s almost always how the agent instance, storage backend, or message state is being created per request.

In practice, this shows up as memory not persisting in production, history resets after restart, or agent starts with empty context on every API call. The root cause is usually a stateless deployment pattern that recreates AssistantAgent, UserProxyAgent, or your memory store on every invocation.

The Most Common Cause

The #1 cause is instantiating memory inside the request handler instead of creating a long-lived agent + persistent store.

If you do this in an API route, serverless function, or background job, each request gets a fresh in-memory object. That works in local dev because the process stays alive. In production, especially on Vercel, Lambda, Docker autoscaling, or multiple Node workers, it resets.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Memory created inside handler	Memory created once and backed by persistent storage
Uses process-local state	Uses Redis/Postgres/file-backed persistence
New `AssistantAgent` per request	Reuses agent or reloads conversation state

// ❌ Broken: memory is recreated on every request
import { AssistantAgent } from "@autogen/agent";
import { Memory } from "@autogen/memory";

export async function POST(req: Request) {
  const body = await req.json();

  const memory = new Memory(); // in-memory only
  const agent = new AssistantAgent({
    name: "support-agent",
    systemMessage: "You are a helpful assistant.",
    memory,
  });

  const result = await agent.run(body.message);
  return Response.json({ result });
}

// ✅ Fixed: persistent store + shared initialization
import { AssistantAgent } from "@autogen/agent";
import { Memory } from "@autogen/memory";
import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL!);

// create once at module scope
const memory = new Memory({
  store: {
    get: async (key: string) => redis.get(key),
    set: async (key: string, value: string) => redis.set(key, value),
  },
});

const agent = new AssistantAgent({
  name: "support-agent",
  systemMessage: "You are a helpful assistant.",
  memory,
});

export async function POST(req: Request) {
  const body = await req.json();

  const result = await agent.run(body.message);
  return Response.json({ result });
}

The key detail is not just “move it outside the handler.” You need persistence behind the memory layer. If you only hoist new Memory() to module scope but still use an in-process array or Map, it will still vanish on cold start or scale-out.

Other Possible Causes

1) You’re using ephemeral runtime storage

If your memory implementation writes to /tmp, local files, or process variables, production will eventually lose it.

// ❌ Ephemeral file storage
await fs.writeFile("/tmp/autogen-memory.json", JSON.stringify(state));

// ✅ Real persistent storage
await db.conversations.upsert({
  where: { conversationId },
  data: { messages: JSON.stringify(state) },
});

2) Conversation ID changes between requests

AutoGen can only resume memory if you load the same thread/session key every time. If you generate a new ID per request, you’ve effectively started a new conversation.

// ❌ New ID each request
const conversationId = crypto.randomUUID();

// ✅ Stable ID from auth/session/customer context
const conversationId = `${userId}:${ticketId}`;

If you’re using an AutoGen class like ConversableAgent, make sure whatever wrapper you use maps messages to a stable thread key.

3) Multiple replicas are not sharing state

This one bites teams deploying behind a load balancer. One request hits pod A, next request hits pod B. If memory lives in pod A only, the second request sees nothing.

# ❌ No shared backing store for stateful agent data
replicas: 3

Fix it by moving message history and memory to Redis, Postgres, Cosmos DB, DynamoDB, or another shared datastore.

4) You are not rehydrating state before calling `run()`

Some AutoGen setups require you to load prior messages into the agent before continuing the chat. If you skip that step, the agent behaves like it has no history even though you stored it elsewhere.

// ❌ Stored history exists but never loaded into agent
const history = await db.getMessages(conversationId);
await agent.run(userMessage);

// ✅ Rehydrate before running
const history = await db.getMessages(conversationId);

for (const msg of history) {
  agent.addMessage(msg);
}

await agent.run(userMessage);

How to Debug It

•
Log the conversation key on every request
- •Confirm the same conversationId is used across turns.
- •If it changes, your “memory issue” is really a session-key bug.
•
Check whether state survives process restarts
- •Restart your app and inspect whether prior messages still exist.
- •If they disappear after restart, your store is process-local.
•
Inspect where Memory, AssistantAgent, or ConversableAgent are instantiated
- •If they’re created inside route handlers or lambdas, that’s suspect.
- •Move initialization to module scope only if the backing store is shared.
•
Trace reads and writes to your persistence layer
- •Add logs around get, set, and message append operations.
- •
  You want to see:
  - •write after user turn
  - •read before next turn
  - •same key both times

Example:

console.log("conversationId", conversationId);
console.log("saving messages", messages.length);
console.log("loading messages", loaded.length);

If saving works but loading returns zero on the next request, the bug is almost always storage-related or key-related.

Prevention

•
Use a real persistence layer from day one:
- •Redis for short-lived conversational state
- •Postgres for durable audit-friendly chat history
•
Treat conversation IDs as part of your domain model:
- •user ID + ticket ID + channel ID is better than random UUIDs per request
•
Add an integration test that simulates two separate requests:
- •first call stores memory
- •second call runs in a fresh process and verifies history is restored

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit