How to Fix 'rate limit exceeded during development' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21

rate-limit-exceeded-during-developmentautogenpython

What the error means

rate limit exceeded during development usually means your AutoGen agent is making more model calls than the API provider allows in a short window. In practice, this shows up when you run multi-agent loops, recursive tool calls, or retry-heavy workflows against OpenAI, Azure OpenAI, or another hosted LLM.

You’ll usually see it during local testing because dev setups often use small quotas, shared keys, or aggressive agent configurations that generate bursts of requests.

The Most Common Cause

The #1 cause is an agent loop that calls the model too many times without a stop condition. In AutoGen, this often happens when AssistantAgent and UserProxyAgent keep bouncing messages back and forth, or when max_consecutive_auto_reply is too high.

Here’s the broken pattern and the fixed pattern side by side:

Broken	Fixed
```python
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent( name="assistant", llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]}, )

user_proxy = UserProxyAgent( name="user_proxy", human_input_mode="NEVER", max_consecutive_auto_reply=50, )

user_proxy.initiate_chat( assistant, message="Write a full insurance claims workflow and keep improving it." ) |python from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent( name="assistant", llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]}, )

user_proxy = UserProxyAgent( name="user_proxy", human_input_mode="NEVER", max_consecutive_auto_reply=3, )

user_proxy.initiate_chat( assistant, message="Write a concise insurance claims workflow." )


The broken version invites long back-and-forth exchanges with no practical cap. The fixed version lowers the auto-reply ceiling so the conversation terminates before you burn through your request budget.

A more production-safe pattern is to set both a hard turn limit and a termination check:

```python
def is_termination_msg(msg):
    return "DONE" in msg.get("content", "")

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
    is_termination_msg=is_termination_msg,
)

Other Possible Causes

1. Multiple agents are sharing one API key

If you spin up several AssistantAgent instances with the same key, they all count against the same rate bucket.

config_list = [{"model": "gpt-4o-mini", "api_key": "SHARED_KEY"}]

agent_a = AssistantAgent(name="a", llm_config={"config_list": config_list})
agent_b = AssistantAgent(name="b", llm_config={"config_list": config_list})

If both agents run in parallel, expect bursts. Use separate keys per environment or throttle concurrency.

2. Retry logic is multiplying requests

AutoGen may retry on transient failures, and your own wrapper may also retry. That doubles or triples traffic fast.

# Bad: app-level retry around agent chat
for _ in range(3):
    try:
        user_proxy.initiate_chat(assistant, message="Summarize policy docs")
        break
    except Exception:
        pass

Prefer one retry layer only. If you must retry, add exponential backoff and inspect the exception type before retrying.

3. Tool execution triggers extra model calls

Tool-calling agents can create additional LLM turns after each tool result. If your tool returns large payloads or noisy output, the agent may keep asking follow-up questions.

assistant = AssistantAgent(
    name="assistant",
    llm_config={
        "config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}],
        "temperature": 0,
    },
)

Keep tool outputs short and structured. Return only what the next turn needs.

4. Your provider quota is lower than you think

Sometimes this is not a code bug. Development keys often have low RPM/TPM limits or exhausted monthly credits.

Check your provider settings:

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "base_url": os.getenv("OPENAI_BASE_URL"),
        }
    ]
}

If you’re on Azure OpenAI, verify deployment-level limits separately from subscription limits.

How to Debug It

•
Count how many turns happen before failure
If the error appears after several rapid exchanges, it’s likely an agent loop or too many auto-replies.
•
Log every model call
Enable AutoGen logging and print message lengths. Look for repeated prompts or repeated tool-result cycles.
```
import logging
logging.basicConfig(level=logging.INFO)
```
•
Check whether multiple agents share one key
Search your code for reused api_key, shared config_list, or parallel tasks using the same credentials.
•
Temporarily disable retries and tools
Run a minimal chat with one AssistantAgent and one UserProxyAgent. If the error disappears, add features back one at a time until it returns.

Prevention

•
Set explicit limits:
- •max_consecutive_auto_reply
- •termination conditions
- •bounded task scopes
•Add rate-aware backoff around any custom retry logic.
•Keep tool outputs small and avoid letting agents generate unbounded follow-up loops.
•Use separate keys or deployments for dev, staging, and production when possible.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit