How to Fix 'rate limit exceeded during development' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21

rate-limit-exceeded-during-developmentcrewaipython

What the error means

rate limit exceeded during development usually means your CrewAI agents are calling an LLM API too aggressively while you’re iterating locally. In practice, it shows up when multiple agents, repeated tool calls, or tight retry loops burn through provider limits fast.

You’ll usually see it with OpenAI, Anthropic, or other hosted models behind CrewAI when the same task is triggered repeatedly in a short window.

The Most Common Cause

The #1 cause is creating a crew inside a loop or re-running the same crew multiple times without caching or backoff. That pattern looks harmless in development, but each crew.kickoff() can trigger several model calls per agent.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Recreates agents and crew on every iteration	Creates them once and reuses them
Calls `kickoff()` repeatedly in a loop	Batches inputs or throttles execution
No retry/backoff strategy	Adds controlled retries and delays

# broken.py
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI

def run_job(user_inputs):
    for item in user_inputs:
        researcher = Agent(
            role="Researcher",
            goal="Research the topic",
            backstory="Senior analyst",
            llm=ChatOpenAI(model="gpt-4o-mini")
        )

        writer = Agent(
            role="Writer",
            goal="Write the summary",
            backstory="Technical writer",
            llm=ChatOpenAI(model="gpt-4o-mini")
        )

        task = Task(
            description=f"Summarize: {item}",
            expected_output="A short summary",
            agent=researcher
        )

        crew = Crew(
            agents=[researcher, writer],
            tasks=[task],
            process=Process.sequential
        )

        result = crew.kickoff()
        print(result)

# fixed.py
import time
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

researcher = Agent(
    role="Researcher",
    goal="Research the topic",
    backstory="Senior analyst",
    llm=llm
)

writer = Agent(
    role="Writer",
    goal="Write the summary",
    backstory="Technical writer",
    llm=llm
)

def run_job(user_inputs):
    crew = Crew(
        agents=[researcher, writer],
        tasks=[],
        process=Process.sequential
    )

    for item in user_inputs:
        task = Task(
            description=f"Summarize: {item}",
            expected_output="A short summary",
            agent=researcher
        )
        crew.tasks = [task]
        result = crew.kickoff()
        print(result)
        time.sleep(2)  # basic throttle during dev

The broken version creates fresh objects and fires requests as fast as Python can loop. The fixed version reuses the LLM and adds a small delay so you stop hammering the provider during local testing.

Other Possible Causes

1) Nested agent calls causing hidden fan-out

One task may trigger another task through tools or delegation. In CrewAI, that can multiply requests without looking obvious in your code.

# too much fan-out
crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2, task3],
    process=Process.hierarchical,
    manager_llm=ChatOpenAI(model="gpt-4o-mini")
)

If you do not need delegation during development, switch to Process.sequential.

crew = Crew(
    agents=[researcher, writer],
    tasks=[task1],
    process=Process.sequential
)

2) Tool loops that keep calling the model

A tool that returns incomplete data can cause the agent to call it again and again. This often happens with web search, scraping, or database lookup tools.

@tool("search_docs")
def search_docs(query: str) -> str:
    return ""  # empty response makes the agent retry mentally and via tools

Return bounded results and stop conditions.

@tool("search_docs")
def search_docs(query: str) -> str:
    return "Top 3 relevant snippets only"

3) No rate limiting around parallel execution

If you run multiple crews with asyncio.gather() or threads, you can exceed provider quotas quickly.

# risky during dev
results = await asyncio.gather(*[crew.kickoff_async() for _ in range(10)])

Throttle concurrency.

sem = asyncio.Semaphore(2)

async def limited_run():
    async with sem:
        return await crew.kickoff_async()

4) Retry policy is too aggressive

Some SDKs retry on 429 Too Many Requests immediately. That just turns one failure into five failures.

llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=10  # too high for local debugging
)

Use smaller retries and exponential backoff if your client supports it.

llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=2,
)

How to Debug It

•
Count actual model calls
- •Add logging around every crew.kickoff() and every tool invocation.
- •If one user action triggers 5–20 LLM calls, you found your multiplier.
•
Disable delegation first
- •Set process=Process.sequential.
- •Remove manager agents and nested crews until the error disappears.
•
Test with one task and one agent
- •Reduce to a single Agent, single Task, single kickoff().
- •If it still fails, the issue is likely provider limits or retries rather than crew structure.
•
Inspect provider response codes
- •
  Look for 429 Too Many Requests, RateLimitError, or messages like:
  - •openai.RateLimitError: Error code: 429
  - •anthropic.RateLimitError: Rate limit exceeded
- •If you see those directly from the SDK, CrewAI is just surfacing the upstream limit.

Prevention

•Reuse Agent, Task, and LLM instances instead of rebuilding them inside loops.
•Keep development crews small: one task at a time until behavior is stable.
•Add throttling for async runs and limit retries on your LLM client.
•Use cheaper models like gpt-4o-mini for local iteration before moving to heavier models.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit