How to Fix 'rate limit exceeded' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
rate-limit-exceededautogentypescript

What the error means

rate limit exceeded in AutoGen usually means your app is sending too many model requests in a short window, or a single agent loop is generating more calls than your provider allows. In TypeScript projects, this often shows up during multi-agent chats, recursive tool calls, or when you reuse the same model client across concurrent requests.

The exact message varies by provider, but you’ll often see something like:

  • Error: 429 Rate limit exceeded
  • OpenAIError: Rate limit reached for requests
  • BadRequestError: 429 You have exceeded your current quota
  • AutoGen wrapping the provider error inside AssistantAgent.run() or RoundRobinGroupChat.run()

The Most Common Cause

The #1 cause is an unbounded agent loop that keeps calling the model without backoff, stop conditions, or message trimming.

In AutoGen TypeScript, this usually happens when you run a RoundRobinGroupChat or repeatedly call AssistantAgent.run() inside a loop and never cap the number of turns. One user prompt can turn into dozens of LLM calls.

Broken vs fixed pattern

Broken patternFixed pattern
Keeps re-running agents with no capLimits turns and stops on completion
Reuses full conversation history foreverTrims context or summarizes state
No retry/backoff on 429Retries with exponential backoff
// ❌ Broken: unbounded loop can trigger rate limits fast
import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient,
});

while (true) {
  const result = await agent.run({
    task: "Answer the user's question",
    messages: conversation,
  });

  conversation.push(...result.messages);
}
// ✅ Fixed: cap iterations and stop when done
import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient,
});

for (let i = 0; i < 3; i++) {
  const result = await agent.run({
    task: "Answer the user's question",
    messages: conversation,
  });

  conversation.push(...result.messages);

  const lastMessage = result.messages[result.messages.length - 1];
  if (lastMessage?.content?.includes("DONE")) break;
}

If you are using group chat orchestration, the same rule applies. A RoundRobinGroupChat with no termination condition will keep cycling agents until the provider shuts you down.

// ✅ Prefer explicit termination conditions in group chat flows
const chat = new RoundRobinGroupChat({
  participants: [planner, executor, reviewer],
  maxTurns: 6,
});

Other Possible Causes

1) Too much concurrency

If you fire multiple requests in parallel, you can hit request-per-minute limits even if each individual call is valid.

// ❌ Broken: blasts the API with parallel calls
await Promise.all(users.map((u) => agent.run({ task: u.prompt })));
// ✅ Fixed: serialize or throttle requests
for (const u of users) {
  await agent.run({ task: u.prompt });
}

If you need throughput, add a queue and a concurrency limit instead of raw Promise.all.


2) Large prompts causing retries and extra token usage

Huge message histories increase token usage per call. That can push you into token-per-minute limits faster than expected.

// ❌ Broken: sends full history every time
const result = await agent.run({
  task,
  messages: fullConversationHistory,
});
// ✅ Fixed: trim history before each run
const trimmedHistory = fullConversationHistory.slice(-10);

const result = await agent.run({
  task,
  messages: trimmedHistory,
});

For production agents, summarize older turns instead of carrying every message forever.


3) Retry logic that retries too aggressively

Some apps wrap AutoGen calls in a generic retry helper that immediately retries on every failure. That turns one rate-limit event into five more failed requests.

// ❌ Broken: instant retries make rate limiting worse
for (let attempt = 0; attempt < 5; attempt++) {
  try {
    return await agent.run({ task });
  } catch (e) {}
}
// ✅ Fixed: exponential backoff with jitter
for (let attempt = 0; attempt < 5; attempt++) {
  try {
    return await agent.run({ task });
  } catch (e) {
    const delayMs = Math.pow(2, attempt) * 500 + Math.random() * 250;
    await new Promise((r) => setTimeout(r, delayMs));
    if (attempt === 4) throw e;
  }
}

If your provider returns a 429, back off. Don’t hammer it harder.


4) Shared model client across multiple workflows

Reusing one modelClient across many agents and jobs is fine, but if all jobs start at once, they compete for the same quota.

// ❌ Broken: multiple workflows start together on the same client quota
await Promise.all([
  workflowA(modelClient),
  workflowB(modelClient),
]);
// ✅ Fixed: queue work or isolate high-volume jobs
await workflowA(modelClient);
await workflowB(modelClient);

If you need separate quotas, use different API keys/projects where your provider supports it.

How to Debug It

  1. Check whether it’s a burst issue or a steady-state issue

    • If errors happen only under load, it’s usually concurrency.
    • If errors happen after several turns in one conversation, it’s probably an infinite loop or oversized context.
  2. Log every model call

    • Print agent name, turn number, prompt length, and timestamp.
    • For AutoGen flows, log around AssistantAgent.run() and group chat turn boundaries.
  3. Inspect the raw provider error

    • Look for HTTP 429, request-per-minute headers, or token-per-minute headers.
    • If the stack trace includes OpenAIChatCompletionClient, AssistantAgent, or RoundRobinGroupChat, the issue is upstream quota pressure rather than an AutoGen bug.
  4. Temporarily reduce load

    • Set concurrency to 1.
    • Reduce max turns.
    • Trim messages to the last few exchanges.
    • If the error disappears, you’ve found the pressure point.

Prevention

  • Set hard limits everywhere:

    • max turns in group chats
    • max retries per request
    • max concurrent jobs per worker
  • Add backoff for transient failures:

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
  • Keep conversation state small:
    • summarize old messages
    • drop irrelevant tool outputs
    • avoid resending full history unless required

If you’re building production agents with AutoGen TypeScript, treat rate limits as a capacity problem first and a code problem second. Most fixes are about controlling call volume, not changing providers.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides