How to Fix 'token limit exceeded' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceededautogentypescript

What the error means

token limit exceeded in AutoGen usually means the messages you are sending to the model are longer than the context window allows. In TypeScript projects, this shows up when an agent keeps appending chat history, tool outputs, or retrieved documents until the request can’t fit.

You’ll usually hit it during long multi-turn chats, recursive tool calls, or when you pass large strings into AssistantAgent / UserProxyAgent without trimming them first.

The Most Common Cause

The #1 cause is unbounded chat history. AutoGen keeps adding prior messages to each model call, and if you never summarize or truncate, you eventually exceed the model’s token limit.

Here’s the broken pattern:

import { AssistantAgent } from "@autogen/agent";

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient,
  systemMessage: "You are a helpful banking support assistant.",
});

const messages = [];

for (const turn of conversationTurns) {
  messages.push({ role: "user", content: turn });
}

const result = await agent.run(messages);

And here’s the fixed version:

import { AssistantAgent } from "@autogen/agent";

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient,
  systemMessage: "You are a helpful banking support assistant.",
});

// Keep only the last few turns or summarize older context.
const recentTurns = conversationTurns.slice(-6);

const result = await agent.run(
  recentTurns.map((turn) => ({
    role: "user",
    content: turn,
  }))
);

If your workflow needs long context, don’t keep everything in raw chat history. Summarize older turns into a compact state object and pass that forward instead of replaying every message.

Other Possible Causes

1) Tool output is too large

A common failure mode is returning full API payloads, PDFs, logs, or database dumps from tools. AutoGen then injects that output back into the conversation and blows past the limit.

// Bad: returning huge payloads
async function getClaimsData() {
  return JSON.stringify(bigClaimsResponse);
}

// Better: return only what the agent needs
async function getClaimsData() {
  const data = await fetchClaims();
  return JSON.stringify({
    claimId: data.claimId,
    status: data.status,
    updatedAt: data.updatedAt,
  });
}

2) You are using a model with a smaller context window than expected

Sometimes the code is fine, but the deployment target changed. For example, switching from a larger-context model to a smaller one will surface errors like:

  • Error: token limit exceeded
  • OpenAIError: This model's maximum context length is ...
  • BadRequestError: Input exceeds maximum context length

Check your model config:

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini", // smaller window than some alternatives
});

If your prompts are large, move to a larger-context model or reduce prompt size.

3) Your system prompt is doing too much

Long policy blocks, compliance text, and giant instruction sets add up fast. I see this a lot in insurance and finance assistants where teams paste entire SOPs into systemMessage.

// Bad
systemMessage: `
You are an insurance assistant.
Follow these 40 rules...
[pages of policy text]
`

Trim it down and move static reference material into retrieval or external rules logic.

// Better
systemMessage: `
You are an insurance assistant.
Follow company policy and ask for clarification when required.
Use tools for policy lookup and claims status.
`

4) Recursive agent loops keep re-feeding their own output

If one agent calls another and both append full transcripts, token usage grows very quickly. This happens in planner/executor setups when each step includes the entire prior chain.

// Bad pattern: passing full transcript every time
await planner.run(fullConversationLog);
await executor.run(fullConversationLog);

Instead, pass only the current task state plus a short summary of prior work.

How to Debug It

  1. Log prompt size before every model call
    Print message counts and approximate character length. If one request suddenly spikes, you found your culprit.

    console.log("messages:", messages.length);
    console.log("chars:", messages.reduce((n, m) => n + m.content.length, 0));
    
  2. Inspect tool outputs
    If the error appears after a tool call, log the tool return value length. Large JSON blobs are usually obvious once you check them.

  3. Check which model you’re actually using
    Confirm your modelClient.model value in runtime logs. A config drift between environments can explain why local tests pass but staging fails.

  4. Binary search the conversation
    Remove half of the prior turns and retry. If it works, keep narrowing down until you find whether it’s chat history, system prompt size, or tool output causing the overflow.

Prevention

  • Keep a rolling window of recent messages instead of sending full transcripts forever.
  • Summarize old state into compact structured memory before continuing a long workflow.
  • Cap tool outputs aggressively; return IDs, statuses, and short excerpts instead of raw dumps.
  • Choose models with enough context for your real payload size, not just your happy-path demo.

If you’re building production AutoGen agents in TypeScript, treat token budget like memory budget in backend systems. Measure it early, cap it everywhere, and don’t let agents carry around their entire life story on every request.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides