How to Fix 'context length exceeded' in AutoGen (TypeScript)
What the error means
context length exceeded means AutoGen sent a prompt to the model that was larger than the model’s token window. In practice, this shows up when your agent chat history keeps growing, when you stuff too much data into a single message, or when tool output gets echoed back into the conversation.
With TypeScript AutoGen setups, this usually happens after a few turns of multi-agent back-and-forth or when you pass large documents into UserProxyAgent / AssistantAgent without trimming.
The Most Common Cause
The #1 cause is unbounded chat history. AutoGen keeps appending messages, and eventually OpenAIChatCompletionClient tries to send a request that blows past the model’s context limit.
Here’s the broken pattern:
import { AssistantAgent, UserProxyAgent } from "@autogen/core";
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: "You are a helpful assistant.",
});
const user = new UserProxyAgent({
name: "user",
});
for (const chunk of largeDocumentChunks) {
await user.send({ content: chunk });
await assistant.run(); // history keeps growing
}
And here’s the fixed pattern:
import { AssistantAgent, UserProxyAgent } from "@autogen/core";
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: "You are a helpful assistant.",
});
const user = new UserProxyAgent({
name: "user",
});
for (const chunk of largeDocumentChunks) {
await user.send({ content: chunk });
const result = await assistant.run({
maxTurns: 1,
// keep only what you need for this turn
clearHistory: true,
});
console.log(result);
}
The important part is not letting every prior turn stay in memory forever. If you need long-running workflows, summarize state and pass forward only the summary plus the current task.
Broken vs fixed
| Pattern | Broken | Fixed |
|---|---|---|
| Conversation memory | Keeps every message forever | Clears or trims history |
| Large inputs | Sends raw docs in full | Chunk, summarize, or retrieve relevant slices |
| Tool output | Dumps full JSON/text back into chat | Return only compact results |
Other Possible Causes
1. Huge tool output getting injected back into the prompt
If your tool returns a giant JSON blob, AutoGen may add that result to the conversation context.
// Bad: returns entire database export
return JSON.stringify(rows);
Fix it by returning only what the model needs:
// Better: return compact summary
return JSON.stringify({
count: rows.length,
sample: rows.slice(0, 5),
});
2. Oversized system message
A long systemMessage with policies, examples, and business rules can eat a big chunk of context before the user even speaks.
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: veryLongPolicyDoc,
});
Trim it down:
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: [
"You are an underwriting assistant.",
"Ask for missing fields before making assumptions.",
"Return concise answers.",
].join("\n"),
});
3. Model context window too small
Not all models have the same token limit. If you’re using a smaller model behind OpenAIChatCompletionClient, your prompt may fit on one model and fail on another.
import { OpenAIChatCompletionClient } from "@autogen/openai";
const client = new OpenAIChatCompletionClient({
model: "gpt-4o-mini", // smaller window than larger variants in many workloads
});
If your workload is heavy, move to a model with more context or reduce prompt size.
4. Passing full documents instead of retrieved chunks
This is common in RAG flows. Developers load an entire PDF or policy pack into one message instead of retrieving just relevant sections.
await assistant.run({
input: fullPolicyText, // too large
});
Use retrieval or chunking:
await assistant.run({
input: relevantChunkSummary,
});
How to Debug It
- •
Log prompt size before each call
Measure how much text you’re actually sending. In TypeScript, log message lengths and approximate token counts beforerun(). - •
Inspect which message exploded
Check whether it was:- •a giant user upload
- •tool output
- •accumulated chat history
- •an oversized system prompt
- •
Reduce to one turn
Run the same flow withmaxTurns: 1and no tools. If it passes, the issue is usually history growth or tool spam. - •
Binary search your inputs
Cut the document/tool payload in half until the error disappears. That tells you whether the problem is one huge message or gradual accumulation across turns.
A typical OpenAI-side failure looks like this:
BadRequestError: This model's maximum context length is 128000 tokens.
However, your messages resulted in 131245 tokens.
Please reduce the length of the messages.
In AutoGen wrappers, you may also see errors bubble up through agent execution like:
Error during agent run:
BadRequestError: context length exceeded
Prevention
- •Keep agent memory bounded.
- •Summarize old turns.
- •Clear history between tasks.
- •Never send raw large artifacts into chat.
- •Chunk PDFs, logs, and exports.
- •Make tool outputs compact.
- •Return IDs, summaries, counts, and small samples instead of full payloads.
- •Pick a model that matches your workload.
- •Don’t force long-document workflows onto a small context window.
If you’re building production agents in TypeScript, treat context as a budget. Once you start budgeting tokens explicitly, this error stops being random and becomes easy to control.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit