How to Fix 'context length exceeded when scaling' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

context-length-exceeded-when-scalingautogentypescript

What the error means

context length exceeded when scaling in AutoGen usually means your agent conversation grew past the model’s token window while the framework was trying to continue a run. In TypeScript, this tends to show up after a few turns of tool use, long message history, or recursive agent-to-agent handoffs.

The failure is not random. It almost always means you are feeding too much conversation state back into the next AssistantAgent, UserProxyAgent, or GroupChatManager turn.

The Most Common Cause

The #1 cause is unbounded message accumulation. You keep appending every message, tool result, and intermediate step into the same chat history, then reuse that history for the next call.

Here’s the broken pattern:

Broken	Fixed
Reuses full history forever	Trims history or starts a fresh thread
Passes large tool outputs back into context	Summarizes or stores externally
Lets group chat grow without limits	Caps rounds and message count

// BROKEN: conversation keeps growing until the model context blows up
import { AssistantAgent, UserProxyAgent } from "@autogen/agentchat";

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
});

const user = new UserProxyAgent({
  name: "user",
});

const messages: any[] = [];

async function runLoop(task: string) {
  messages.push({ role: "user", content: task });

  const result = await assistant.run(messages);
  messages.push(...result.messages);

  // Next call reuses everything again
  return assistant.run(messages);
}

// FIXED: keep only the relevant window, or start a new thread per task
import { AssistantAgent } from "@autogen/agentchat";

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
});

function trimMessages(messages: any[], maxMessages = 12) {
  return messages.slice(-maxMessages);
}

async function runTask(task: string) {
  const messages = [
    { role: "user", content: task },
  ];

  const result = await assistant.run(trimMessages(messages));
  return result;
}

If you are using GroupChat, the same issue appears when every agent reply is retained indefinitely. A GroupChatManager with no practical cap will eventually hit the model limit even if each individual message looks small.

Other Possible Causes

1. Large tool outputs are being injected directly into chat

If a function returns a huge JSON payload, logs, or HTML blob, AutoGen will often place that text into the next prompt.

// BAD: raw API response gets shoved into context
const weatherTool = async () => {
  return JSON.stringify(bigApiResponse); // thousands of tokens
};

Fix it by summarizing or storing raw output elsewhere:

// GOOD: store raw payload externally, return only what matters
const weatherTool = async () => {
  const raw = await fetchWeather();
  saveRawPayload(raw);

  return JSON.stringify({
    summary: raw.summary,
    alerts: raw.alerts.slice(0, 5),
  });
};

2. Recursive agent loops with no stop condition

This happens when one agent keeps asking another agent to refine, retry, or validate forever.

// BAD: no termination guard
while (true) {
  const result = await assistant.run(history);
  history.push(...result.messages);
}

Add a hard cap on iterations:

// GOOD: bounded retries
for (let i = 0; i < 5; i++) {
  const result = await assistant.run(history);
  history.push(...result.messages);

  if (result.messages.some(m => String(m.content).includes("DONE"))) break;
}

3. Model context is smaller than you think

You may be using a smaller context model in TypeScript than in your Python setup. A switch from GPT-4-class models to a smaller deployment can surface this immediately.

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini", // smaller window than larger variants depending on config/provider
});

Check the exact deployed model and its token limit. Don’t assume your provider alias maps to the same window across environments.

4. Hidden prompt bloat in system messages or templates

A long system prompt, repeated policy block, or copied schema can eat a surprising amount of context before the first user turn even starts.

const assistant = new AssistantAgent({
  name: "assistant",
  systemMessage: `
    ...500 lines of instructions...
    ...full API schema...
    ...multiple examples...
`,
});

Move large reference material out of the system prompt. Keep only operational instructions there.

How to Debug It

•
Log message count and approximate token size
- •Print every message role and length before calling run().
- •If you see dozens of turns or huge tool payloads, you found your culprit.
•
Inspect tool outputs
- •Search for functions returning raw JSON arrays, HTML, PDFs converted to text, or stack traces.
- •If one tool response is massive, trim it before sending it back to the agent.
•
Check whether history is reused across tasks
- •A common bug is keeping one global messages array for every request.
- •Each user task should usually get its own bounded conversation state.
•
Reduce to one agent and one turn
- •Remove GroupChatManager, retries, and tools.
- •If the error disappears, add pieces back until it breaks again.

A useful diagnostic helper:

function debugMessages(messages: { role: string; content?: string }[]) {
  console.log("messageCount =", messages.length);
  console.table(
    messages.map((m, i) => ({
      i,
      role: m.role,
      chars: (m.content ?? "").length,
      preview: (m.content ?? "").slice(0, 80),
    }))
  );
}

Prevention

•
Keep chat history bounded.
- •Use sliding windows, per-task threads, or explicit truncation before each assistant.run() call.
•
Never pass raw tool dumps back into context.
- •Summarize results and persist full payloads in storage if you need them later.
•
Put hard limits on loops and group chats.
- •Cap retry counts, speaker rounds, and handoff depth so one bad workflow cannot grow forever.

If you’re seeing context length exceeded when scaling in AutoGen TypeScript, treat it as a state management bug first and a model issue second. In production systems, bounded memory wins every time.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit