How to Fix 'context length exceeded during development' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-during-developmentautogentypescript

When AutoGen throws context length exceeded during development, it usually means your agent conversation is growing faster than the model’s token window can handle. In TypeScript projects, this typically shows up after a few tool calls, long message histories, or when you keep feeding full transcripts back into the next turn.

The failure is not random. In most cases, you are sending too much conversation state to the model, or you are accidentally re-attaching old messages on every loop.

The Most Common Cause

The #1 cause is unbounded message accumulation in your AssistantAgent / UserProxyAgent loop.

A common anti-pattern is appending the entire transcript to every new request. That works for a few turns, then blows up with errors like:

  • Error: context length exceeded
  • 400 Bad Request: This model's maximum context length is ...
  • OpenAIError: Request too large for model gpt-4o-mini

Broken vs fixed pattern

Broken patternFixed pattern
Re-sends full history every turnSends only recent messages or summarized state
Keeps tool outputs foreverTrims or summarizes tool outputs
No token budgetingEnforces max history length
import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({
  name: "assistant",
  modelClient,
});

// ❌ Broken: keeps adding the entire transcript back into each call
let transcript: any[] = [];

for (const userInput of inputs) {
  transcript.push({ role: "user", content: userInput });

  const result = await agent.run({
    messages: transcript, // grows forever
  });

  transcript.push(...result.messages);
}
import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({
  name: "assistant",
  modelClient,
});

const MAX_MESSAGES = 12;
let transcript: any[] = [];

for (const userInput of inputs) {
  transcript.push({ role: "user", content: userInput });

  const recentMessages = transcript.slice(-MAX_MESSAGES);

  const result = await agent.run({
    messages: recentMessages,
  });

  transcript.push(...result.messages);

  // Optional: keep only the latest window
  if (transcript.length > MAX_MESSAGES) {
    transcript = transcript.slice(-MAX_MESSAGES);
  }
}

If you need long-running memory, do not keep raw chat history forever. Summarize older turns into a compact system note or store them outside the prompt and retrieve only what matters.

Other Possible Causes

1) Tool outputs are too large

AutoGen agents often call tools that return JSON blobs, logs, HTML, or database rows. If you pass those raw results back into the conversation, token usage spikes fast.

// ❌ Bad: dumps full payload into chat
await agent.run({
  messages: [
    { role: "user", content: "Analyze this" },
    { role: "tool", content: JSON.stringify(hugeResult) },
  ],
});

Fix it by truncating or extracting only relevant fields.

// ✅ Better: reduce payload before sending to the model
const compactResult = {
  summary: hugeResult.summary,
  topErrors: hugeResult.errors?.slice(0, 5),
};

await agent.run({
  messages: [
    { role: "user", content: "Analyze this" },
    { role: "tool", content: JSON.stringify(compactResult) },
  ],
});

2) Your system prompt is oversized

I see this in enterprise apps all the time. People stuff policies, schemas, examples, and business rules into one giant system message.

// ❌ Too much instruction text in one prompt
const systemMessage = `
You are an insurance claims assistant...
[5000+ words of policy text]
[full schema docs]
[10 examples]
`;

Split static instructions from dynamic context. Keep the system prompt short and put reference data behind retrieval or tool calls.

// ✅ Short system prompt with scoped behavior
const systemMessage = `
You are an insurance claims assistant.
Use tools for policy lookup.
Keep answers concise and cite source IDs.
`;

3) Recursive agent loops

If one agent keeps calling another agent and both append full histories, context grows exponentially. This happens in multi-agent orchestration when every hop copies the entire conversation.

// ❌ Every agent receives full upstream history
await planner.run({ messages });
await executor.run({ messages });
await reviewer.run({ messages });

Instead, pass only the minimal handoff state between agents.

// ✅ Pass a compact task object between agents
await planner.run({
  messages: [{ role: "user", content: taskBrief }],
});

await executor.run({
  messages: [{ role: "user", content: plan.summary }],
});

4) Model context window mismatch

Sometimes your code is fine, but the selected model has a smaller context window than you assumed. A local dev config might point to a model with fewer tokens than production.

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini", // smaller window than you expected in some setups
});

Check your actual deployed model and its max input size. If your app depends on long transcripts, choose a larger-context model or enforce stricter trimming.

How to Debug It

  1. Log token growth per turn

    • Count message count and approximate token usage before each run().
    • If it climbs monotonically, you have a history retention problem.
  2. Inspect tool payloads

    • Print tool return sizes.
    • If one tool returns megabytes of JSON or logs, that is your culprit.
  3. Disable memory temporarily

    • Run one request with only the latest user message.
    • If the error disappears, your issue is accumulated context rather than prompt content.
  4. Binary search the prompt

    • Remove half of your system instructions.
    • Remove half of your history.
    • Keep cutting until the request succeeds; then isolate the offending block.

Prevention

  • Keep a hard cap on chat history size.
  • Summarize old turns instead of replaying raw transcripts.
  • Sanitize tool outputs before they re-enter the prompt.
  • Track prompt size in CI so regressions get caught before developers hit them locally.
  • Use explicit handoff objects between agents instead of copying full conversations around.

If you are seeing context length exceeded during development in AutoGen TypeScript, start with message accumulation first. In real projects, that is usually where the bug lives.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides