How to Fix 'rate limit exceeded during development' in AutoGen (TypeScript)
When AutoGen throws rate limit exceeded during development, it usually means your agent loop is making too many model calls in a short burst. In TypeScript projects, this often shows up during local testing when an assistant keeps retrying, a group chat spins too long, or your code creates new agents on every request.
The fix is usually not “increase the limit” first. It’s to find the call pattern that is flooding the model API and stop the repeated requests.
The Most Common Cause
The #1 cause is an unbounded agent loop or repeated run() calls inside application code. With AutoGen, this often happens when you create a new AssistantAgent for every user message and then let it keep responding without a hard stop condition.
Here’s the broken pattern:
import { AssistantAgent, UserProxyAgent } from "@autogen/core";
const assistant = new AssistantAgent({
name: "assistant",
modelClient,
systemMessage: "You are a helpful assistant.",
});
const user = new UserProxyAgent({
name: "user",
});
export async function handleRequest(message: string) {
// Broken: this can trigger multiple model calls per request
const result = await assistant.run({
task: message,
maxTurns: 20,
});
return result;
}
And here’s the fixed version:
import { AssistantAgent, UserProxyAgent } from "@autogen/core";
const assistant = new AssistantAgent({
name: "assistant",
modelClient,
systemMessage: "You are a helpful assistant.",
});
const user = new UserProxyAgent({
name: "user",
});
export async function handleRequest(message: string) {
const result = await assistant.run({
task: message,
maxTurns: 3, // hard cap
});
return result;
}
| Broken | Fixed |
|---|---|
maxTurns: 20 or no practical stop condition | Small bounded turn count |
Repeated run() calls inside request handlers | Single controlled execution per request |
| Agents allowed to keep asking follow-up questions | Explicit stop conditions and limits |
If you’re using GroupChatManager, this gets worse fast because each agent reply can trigger more replies. A loose termination rule is enough to hit rate limits during development.
Other Possible Causes
1) You are recreating agents on every render or request
If you’re in Next.js, Express middleware, or a serverless handler, don’t instantiate agents repeatedly unless you mean to.
// Bad
export async function POST(req: Request) {
const assistant = new AssistantAgent({ name: "assistant", modelClient });
return assistant.run({ task: await req.text() });
}
// Better
const assistant = new AssistantAgent({ name: "assistant", modelClient });
export async function POST(req: Request) {
return assistant.run({ task: await req.text(), maxTurns: 3 });
}
2) Your retry policy is too aggressive
Auto-retries plus your own retry wrapper can multiply requests. That turns one failure into five or ten rapid-fire calls.
const response = await retry(async () => {
return assistant.run({ task: input });
}, { retries: 5 });
Fix it by lowering retries and adding backoff:
const response = await retry(async () => {
return assistant.run({ task: input, maxTurns: 3 });
}, {
retries: 2,
minTimeout: 1000,
});
3) Your model config points at a low quota dev key
A lot of people test with the same API key they use for production or with a free-tier key that has very low limits.
const modelClient = new OpenAIChatCompletionClient({
model: "gpt-4o-mini",
});
Check your environment variables and usage caps:
echo $OPENAI_API_KEY
Also verify you are not sharing one key across multiple local services.
4) Streaming or event handlers are triggering duplicate sends
A UI bug can call the same agent twice if both submit and change handlers fire. That looks like an AutoGen problem, but it’s really duplicate application events.
button.onclick = sendMessage;
form.onsubmit = sendMessage; // duplicate trigger risk
Use one entry point and debounce if needed.
How to Debug It
- •
Count actual model calls Add logging around the model client or agent entry point. If one user action causes multiple
run()invocations, you found the issue.console.log("assistant.run start", Date.now()); - •
Check turn count and termination If you use
maxTurns, group chat managers, or tool loops, verify they terminate quickly. A runaway conversation usually produces repeated messages like:- •
Rate limit exceeded - •
429 Too Many Requests - •
OpenAI API error: Rate limit exceeded
- •
- •
Inspect retries Search for custom retry wrappers and SDK retries. Make sure you do not have both.
- •app-level retry
- •HTTP client retry
- •AutoGen retry behavior
- •
Reduce concurrency to one Run a single request in isolation. If the error disappears, your problem is parallel requests sharing one key or one agent workflow.
Prevention
- •Keep
maxTurnslow during development, usually2-5unless you truly need more. - •Reuse agent instances where possible instead of creating them per request.
- •Put rate-limit-aware backoff on retries and log every failed call with request IDs.
- •Test group chats with a small message budget before turning on tool-heavy workflows.
If you’re still seeing rate limit exceeded during development after capping turns and removing duplicate calls, the next place to look is your orchestration layer. In AutoGen TypeScript projects, the error is usually a symptom of control flow, not just API quota.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit