How to Fix 'OOM error during inference during development' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
oom-error-during-inference-during-developmentautogentypescript

When AutoGen throws OOM error during inference during development, it usually means your agent loop is generating more context than the model process can hold. In TypeScript projects, this often shows up after a few turns, when messages, tool outputs, and nested agent responses keep piling into the prompt.

The failure is usually not “the model is too big.” It’s your conversation state, tool payloads, or recursion pattern growing until inference runs out of memory.

The Most Common Cause

The #1 cause is uncontrolled message accumulation in a long-running AssistantAgent / UserProxyAgent loop. In AutoGen TypeScript, people often keep appending every turn to the same conversation state and then resend the entire history on each inference call.

Here’s the broken pattern:

BrokenFixed
Keeps full history foreverTrims or summarizes history
Resends large tool outputsStores tool output externally
No max turns / token guardrailAdds limits before inference
// BROKEN: unbounded chat history
import { AssistantAgent, UserProxyAgent } from "@autogen/core";

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
});

const user = new UserProxyAgent({
  name: "user",
});

const messages: any[] = [];

while (true) {
  const input = await getNextUserInput();

  messages.push({ role: "user", content: input });

  const result = await assistant.run(messages);
  messages.push(...result.messages); // grows forever

  console.log(result.messages.at(-1)?.content);
}
// FIXED: cap history and summarize old turns
import { AssistantAgent } from "@autogen/core";

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
});

let messages: any[] = [];
const MAX_MESSAGES = 12;

while (true) {
  const input = await getNextUserInput();
  messages.push({ role: "user", content: input });

  if (messages.length > MAX_MESSAGES) {
    const older = messages.slice(0, -6);
    const recent = messages.slice(-6);

    const summary = await assistant.run([
      {
        role: "system",
        content: `Summarize this conversation in under 200 tokens:\n${JSON.stringify(older)}`,
      },
    ]);

    messages = [
      { role: "system", content: `Conversation summary: ${summary.messages.at(-1)?.content}` },
      ...recent,
    ];
  }

  const result = await assistant.run(messages);
  messages.push(...result.messages.slice(-2));
}

If your app keeps reusing the same array of messages, this is where the OOM starts. AutoGen doesn’t magically compress your history for you; you need to manage it.

Other Possible Causes

1. Huge tool outputs being injected into the prompt

A common trap is returning raw JSON, CSV blobs, or entire database rows from a tool call. AutoGen then feeds that payload back into the next inference step.

// BROKEN
const getClaimsTool = async () => {
  return JSON.stringify(await db.claims.findMany()); // massive payload
};

// FIXED
const getClaimsToolFixed = async () => {
  const claims = await db.claims.findMany({ take: 20 });
  return JSON.stringify(
    claims.map(c => ({
      id: c.id,
      status: c.status,
      updatedAt: c.updatedAt,
    }))
  );
};

2. Recursive agent handoffs without a stop condition

If one agent calls another agent and both are configured to continue on every response, you can create an infinite expansion loop.

// BROKEN
const resultA = await agentA.run([{ role: "user", content: "Investigate claim" }]);
const resultB = await agentB.run(resultA.messages);
const resultC = await agentA.run(resultB.messages); // no stop condition

// FIXED
if (turnCount >= 4 || hasFinalAnswer(resultB)) {
  return resultB;
}

3. Model context window mismatch

Sometimes the model client is configured for a smaller context than the prompts you’re sending. The runtime may surface this as memory pressure or inference failure.

// Example config issue
const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  maxTokens: 8000,
});

If your prompts are already huge, reducing maxTokens alone won’t fix it. You need to reduce input size too.

4. Storing full documents in agent state

Developers often put PDFs, policy docs, or OCR output directly into state and reuse that state across turns.

// BROKEN
agentState.documentText = fullPdfText; // hundreds of KB or more

// BETTER
agentState.documentChunks = chunkText(fullPdfText, 1200);
agentState.activeChunkIds = [3, 4, 5];

Keep large artifacts outside the chat transcript. Pass only the relevant chunks into the prompt.

How to Debug It

  1. Log prompt size before every inference call
    Print message count and approximate character length before calling assistant.run(...).

    console.log("messages:", messages.length);
    console.log("chars:", JSON.stringify(messages).length);
    
  2. Inspect the last tool output
    If the crash happens after a function call, dump that tool response size first. Large JSON is usually the culprit.

  3. Disable one feature at a time
    Turn off retrieval, tool calling, nested agents, and memory persistence separately. The feature that makes OOM disappear is your source.

  4. Set hard caps
    Add limits for:

    • max turns
    • max tokens
    • max tool output bytes
    • max retained messages

If you’re seeing errors like JavaScript heap out of memory, OOM error during inference, or repeated failures inside AssistantAgent.run(), treat it as a data growth problem first.

Prevention

  • Keep chat history short. Summarize older turns instead of replaying everything.
  • Never return raw unbounded data from tools. Paginate and truncate aggressively.
  • Put guardrails on multi-agent loops with explicit stop conditions and turn limits.
  • Store documents and retrieval results outside message state, then inject only what’s needed for the current turn.

If you’re building production agents in TypeScript, assume every message will be repeated more than once. That means anything large in memory will hurt you twice: once in your app state and again during inference.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides