How to Fix 'token limit exceeded when scaling' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-when-scalingautogentypescript

When you see token limit exceeded when scaling in AutoGen TypeScript, it usually means the agent conversation grew past the model’s context window during a multi-agent run. This shows up most often when you scale from a small demo to a longer workflow with repeated turns, tool outputs, or nested agent handoffs.

The fix is rarely “use a bigger model” first. In most cases, you’re sending too much conversation history, too much tool output, or both.

The Most Common Cause

The #1 cause is unbounded message accumulation in an AssistantAgent or GroupChat flow. AutoGen keeps appending messages unless you explicitly trim history, summarize, or cap how much context gets forwarded.

Here’s the broken pattern:

import { AssistantAgent, UserProxyAgent } from "@autogen/core";

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
});

const user = new UserProxyAgent({
  name: "user",
});

let messages = [];

for (const task of tasks) {
  messages.push({ role: "user", content: task });
  const result = await assistant.onMessages(messages);
  messages.push(...result.messages);
}

And here’s the fixed pattern:

import { AssistantAgent } from "@autogen/core";

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
});

for (const task of tasks) {
  const result = await assistant.onMessages([
    { role: "user", content: task },
  ]);

  // Keep only what you actually need.
  // Persist a summary or structured output instead of the full transcript.
  saveResult(result);
}

The difference is simple:

BrokenFixed
Reuses one ever-growing messages arraySends only the current turn or a bounded window
Keeps full tool outputs and prior repliesPersists summaries or extracted state
Eventually triggers context overflowStays within token budget

If you are using RoundRobinGroupChat, SelectorGroupChat, or any multi-agent orchestrator, the same rule applies: don’t let the full transcript grow forever.

Other Possible Causes

1. Tool output is too large

A common failure mode is dumping raw JSON, logs, PDFs, or search results into the chat. One tool call can blow your token budget by itself.

// Bad
return {
  content: JSON.stringify(largeResponse),
};
// Better
return {
  content: JSON.stringify({
    count: largeResponse.items.length,
    topItems: largeResponse.items.slice(0, 5),
    summary: largeResponse.summary,
  }),
};

If your tool returns megabytes of text, AutoGen will fail long before the model gets useful context.

2. You are not trimming conversation history

Some AutoGen flows let you configure memory or message retention. If you never trim older turns, every retry adds more tokens.

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
  // Example pattern depending on your setup:
  // maxMessages: 10,
});

If your version exposes message limits, use them. If not, trim upstream before calling the agent.

3. Model context window is smaller than your workload

This error can appear even with reasonable prompts if you picked a smaller context model. A GPT-4 class model with a larger window may handle the same flow that breaks on a smaller one.

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini", // smaller window than larger-tier models
});

Switching to a larger-context model can help, but it does not fix bad message growth.

4. Nested agent handoffs duplicate context

In multi-agent setups, one agent may forward the full transcript to another agent on every turn. That creates duplicated history and token inflation.

// Bad idea: forwarding everything repeatedly
await reviewer.onMessages(allMessages);
await planner.onMessages(allMessages);

Instead, pass only the relevant slice:

await reviewer.onMessages([
  { role: "user", content: latestDraft },
]);

How to Debug It

  1. Log token growth per turn

    • Print message counts and approximate token size before each onMessages() call.
    • If each turn grows linearly and never resets, you found the problem.
  2. Inspect tool payloads

    • Log every tool response size.
    • Look for huge JSON blobs, stack traces, HTML pages, or document text being injected into chat.
  3. Check which agent is forwarding what

    • In RoundRobinGroupChat or custom orchestration code, verify whether agents are receiving the full transcript or only recent messages.
    • Duplicate forwarding is easy to miss in multi-agent pipelines.
  4. Test with a tiny prompt

    • Replace real input with a short fixed message.
    • If it works with small input but fails in production data, your issue is almost certainly payload size or accumulated history.

A useful rule: if the failure happens after several successful turns and then suddenly throws something like Error: token limit exceeded when scaling, it is usually history growth rather than a single bad prompt.

Prevention

  • Bound every conversation

    • Keep only recent turns.
    • Summarize older context into structured state before continuing.
  • Sanitize tool outputs

    • Return compact summaries from tools.
    • Never dump raw documents unless absolutely necessary.
  • Design for state outside chat

    • Store workflow state in your app database.
    • Use AutoGen for reasoning and coordination, not as your primary persistence layer.

If you are building bank or insurance workflows, this matters even more. Claims files, policy documents, and KYC artifacts are exactly the kind of data that will silently push an AutoGen conversation over the edge unless you control what enters the prompt.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides