How to Fix 'token limit exceeded during development' in AutoGen (TypeScript)
When AutoGen throws token limit exceeded during development, it usually means your agent loop is feeding too much conversation history into the model. In TypeScript projects, this shows up fast when you keep appending messages to a single AssistantAgent or RoundRobinGroupChat session without trimming state.
The error is not about your app “using too many tokens overall”. It’s about the prompt payload for a single model call exceeding the model’s context window or your configured token budget.
The Most Common Cause
The #1 cause is uncontrolled message accumulation in memory. A common pattern is to reuse the same chat object across many turns, while passing full history back into every request.
Here’s the broken pattern versus the fixed one:
| Broken | Fixed |
|---|---|
| Reuses one long-lived chat state forever | Trims, resets, or summarizes history |
| Sends every prior message back to the model | Keeps only the last N turns or a compact summary |
Grows until token limit exceeded during development | Stays within a bounded context window |
// BROKEN: history grows forever
import { AssistantAgent, UserProxyAgent } from "@autogen/agents";
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: "You are a helpful banking support agent.",
});
const user = new UserProxyAgent({ name: "user" });
const conversation: Array<{ role: string; content: string }> = [];
async function ask(question: string) {
conversation.push({ role: "user", content: question });
const response = await assistant.run({
messages: conversation, // keeps growing
});
conversation.push({ role: "assistant", content: response.message.content });
}
// FIXED: keep the prompt bounded
import { AssistantAgent } from "@autogen/agents";
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: "You are a helpful banking support agent.",
});
type Msg = { role: "user" | "assistant"; content: string };
const history: Msg[] = [];
const MAX_MESSAGES = 8;
async function ask(question: string) {
history.push({ role: "user", content: question });
const boundedHistory = history.slice(-MAX_MESSAGES);
const response = await assistant.run({
messages: boundedHistory,
});
history.push({ role: "assistant", content: response.message.content });
}
If you’re using RoundRobinGroupChat, the same issue happens when each agent keeps adding verbose outputs and tool traces into shared state. The fix is the same: bound what gets sent to the model.
Other Possible Causes
1) Tool output is too large
A single tool call can blow up your token budget if you dump raw JSON, HTML, logs, or database rows into the chat.
// BAD
const result = await fetchCustomerRecords();
return JSON.stringify(result); // huge payload into context
// GOOD
const result = await fetchCustomerRecords();
return JSON.stringify({
count: result.length,
sample: result.slice(0, 3),
});
2) Your system prompt is bloated
Long policy blocks, duplicated instructions, and copied requirements add up quickly.
// BAD
const assistant = new AssistantAgent({
name: "assistant",
systemMessage: `
You are a support agent.
Follow these rules...
[200 lines of repeated policy text]
`,
});
// GOOD
const assistant = new AssistantAgent({
name: "assistant",
systemMessage:
"You are a support agent. Ask for missing account details before taking action.",
});
3) You are not truncating tool-call traces
AutoGen can retain tool invocation metadata and intermediate reasoning artifacts depending on how you wire agents together. If those traces are re-injected on every turn, token usage climbs fast.
// Example fix idea
const trimmedMessages = messages.filter((m) => m.role !== "tool");
If you need tool outputs later, store them outside the chat transcript and inject only a short summary back into context.
4) Model/context mismatch
Sometimes the issue is not your code but your model choice. If you point AutoGen at a smaller-context model and feed it large conversations, you’ll hit limits immediately.
const config = {
model: "gpt-4o-mini", // smaller context than larger models in some setups
};
Use a larger-context model for long-running multi-agent workflows, or aggressively trim state before each run.
How to Debug It
- •
Log prompt size before every call
Print message count and approximate character length beforeassistant.run()or group chat execution.console.log("messages:", messages.length); console.log("chars:", JSON.stringify(messages).length); - •
Remove tools first
If the error disappears after disabling tools, your tool output is too big or too frequent. - •
Reset conversation state
Start a freshAssistantAgentsession with no prior messages. If it works cleanly, your history growth is the problem. - •
Binary search the transcript
Cut your message list in half and rerun. Keep halving until you find the turn that pushes you over the limit.
Prevention
- •Keep a hard cap on message history per agent session.
- •Summarize old turns instead of replaying full transcripts.
- •Return compact tool results; never inject raw dumps unless absolutely necessary.
- •Set explicit token budgets in your AutoGen configuration and test against worst-case conversations early.
If you’re building multi-agent flows in TypeScript, assume every message will be repeated more than once across orchestration layers. That’s where these errors usually come from.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit