How to Fix 'context length exceeded during development' in AutoGen (TypeScript)
When AutoGen throws context length exceeded during development, it usually means your agent conversation is growing faster than the model’s token window can handle. In TypeScript projects, this typically shows up after a few tool calls, long message histories, or when you keep feeding full transcripts back into the next turn.
The failure is not random. In most cases, you are sending too much conversation state to the model, or you are accidentally re-attaching old messages on every loop.
The Most Common Cause
The #1 cause is unbounded message accumulation in your AssistantAgent / UserProxyAgent loop.
A common anti-pattern is appending the entire transcript to every new request. That works for a few turns, then blows up with errors like:
- •
Error: context length exceeded - •
400 Bad Request: This model's maximum context length is ... - •
OpenAIError: Request too large for model gpt-4o-mini
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Re-sends full history every turn | Sends only recent messages or summarized state |
| Keeps tool outputs forever | Trims or summarizes tool outputs |
| No token budgeting | Enforces max history length |
import { AssistantAgent } from "@autogen/core";
const agent = new AssistantAgent({
name: "assistant",
modelClient,
});
// ❌ Broken: keeps adding the entire transcript back into each call
let transcript: any[] = [];
for (const userInput of inputs) {
transcript.push({ role: "user", content: userInput });
const result = await agent.run({
messages: transcript, // grows forever
});
transcript.push(...result.messages);
}
import { AssistantAgent } from "@autogen/core";
const agent = new AssistantAgent({
name: "assistant",
modelClient,
});
const MAX_MESSAGES = 12;
let transcript: any[] = [];
for (const userInput of inputs) {
transcript.push({ role: "user", content: userInput });
const recentMessages = transcript.slice(-MAX_MESSAGES);
const result = await agent.run({
messages: recentMessages,
});
transcript.push(...result.messages);
// Optional: keep only the latest window
if (transcript.length > MAX_MESSAGES) {
transcript = transcript.slice(-MAX_MESSAGES);
}
}
If you need long-running memory, do not keep raw chat history forever. Summarize older turns into a compact system note or store them outside the prompt and retrieve only what matters.
Other Possible Causes
1) Tool outputs are too large
AutoGen agents often call tools that return JSON blobs, logs, HTML, or database rows. If you pass those raw results back into the conversation, token usage spikes fast.
// ❌ Bad: dumps full payload into chat
await agent.run({
messages: [
{ role: "user", content: "Analyze this" },
{ role: "tool", content: JSON.stringify(hugeResult) },
],
});
Fix it by truncating or extracting only relevant fields.
// ✅ Better: reduce payload before sending to the model
const compactResult = {
summary: hugeResult.summary,
topErrors: hugeResult.errors?.slice(0, 5),
};
await agent.run({
messages: [
{ role: "user", content: "Analyze this" },
{ role: "tool", content: JSON.stringify(compactResult) },
],
});
2) Your system prompt is oversized
I see this in enterprise apps all the time. People stuff policies, schemas, examples, and business rules into one giant system message.
// ❌ Too much instruction text in one prompt
const systemMessage = `
You are an insurance claims assistant...
[5000+ words of policy text]
[full schema docs]
[10 examples]
`;
Split static instructions from dynamic context. Keep the system prompt short and put reference data behind retrieval or tool calls.
// ✅ Short system prompt with scoped behavior
const systemMessage = `
You are an insurance claims assistant.
Use tools for policy lookup.
Keep answers concise and cite source IDs.
`;
3) Recursive agent loops
If one agent keeps calling another agent and both append full histories, context grows exponentially. This happens in multi-agent orchestration when every hop copies the entire conversation.
// ❌ Every agent receives full upstream history
await planner.run({ messages });
await executor.run({ messages });
await reviewer.run({ messages });
Instead, pass only the minimal handoff state between agents.
// ✅ Pass a compact task object between agents
await planner.run({
messages: [{ role: "user", content: taskBrief }],
});
await executor.run({
messages: [{ role: "user", content: plan.summary }],
});
4) Model context window mismatch
Sometimes your code is fine, but the selected model has a smaller context window than you assumed. A local dev config might point to a model with fewer tokens than production.
const modelClient = new OpenAIChatCompletionClient({
model: "gpt-4o-mini", // smaller window than you expected in some setups
});
Check your actual deployed model and its max input size. If your app depends on long transcripts, choose a larger-context model or enforce stricter trimming.
How to Debug It
- •
Log token growth per turn
- •Count message count and approximate token usage before each
run(). - •If it climbs monotonically, you have a history retention problem.
- •Count message count and approximate token usage before each
- •
Inspect tool payloads
- •Print tool return sizes.
- •If one tool returns megabytes of JSON or logs, that is your culprit.
- •
Disable memory temporarily
- •Run one request with only the latest user message.
- •If the error disappears, your issue is accumulated context rather than prompt content.
- •
Binary search the prompt
- •Remove half of your system instructions.
- •Remove half of your history.
- •Keep cutting until the request succeeds; then isolate the offending block.
Prevention
- •Keep a hard cap on chat history size.
- •Summarize old turns instead of replaying raw transcripts.
- •Sanitize tool outputs before they re-enter the prompt.
- •Track prompt size in CI so regressions get caught before developers hit them locally.
- •Use explicit handoff objects between agents instead of copying full conversations around.
If you are seeing context length exceeded during development in AutoGen TypeScript, start with message accumulation first. In real projects, that is usually where the bug lives.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit