How to Fix 'rate limit exceeded during development' in CrewAI (Python)
What the error means
rate limit exceeded during development usually means your CrewAI agents are calling an LLM API too aggressively while you’re iterating locally. In practice, it shows up when multiple agents, repeated tool calls, or tight retry loops burn through provider limits fast.
You’ll usually see it with OpenAI, Anthropic, or other hosted models behind CrewAI when the same task is triggered repeatedly in a short window.
The Most Common Cause
The #1 cause is creating a crew inside a loop or re-running the same crew multiple times without caching or backoff. That pattern looks harmless in development, but each crew.kickoff() can trigger several model calls per agent.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Recreates agents and crew on every iteration | Creates them once and reuses them |
Calls kickoff() repeatedly in a loop | Batches inputs or throttles execution |
| No retry/backoff strategy | Adds controlled retries and delays |
# broken.py
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
def run_job(user_inputs):
for item in user_inputs:
researcher = Agent(
role="Researcher",
goal="Research the topic",
backstory="Senior analyst",
llm=ChatOpenAI(model="gpt-4o-mini")
)
writer = Agent(
role="Writer",
goal="Write the summary",
backstory="Technical writer",
llm=ChatOpenAI(model="gpt-4o-mini")
)
task = Task(
description=f"Summarize: {item}",
expected_output="A short summary",
agent=researcher
)
crew = Crew(
agents=[researcher, writer],
tasks=[task],
process=Process.sequential
)
result = crew.kickoff()
print(result)
# fixed.py
import time
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
researcher = Agent(
role="Researcher",
goal="Research the topic",
backstory="Senior analyst",
llm=llm
)
writer = Agent(
role="Writer",
goal="Write the summary",
backstory="Technical writer",
llm=llm
)
def run_job(user_inputs):
crew = Crew(
agents=[researcher, writer],
tasks=[],
process=Process.sequential
)
for item in user_inputs:
task = Task(
description=f"Summarize: {item}",
expected_output="A short summary",
agent=researcher
)
crew.tasks = [task]
result = crew.kickoff()
print(result)
time.sleep(2) # basic throttle during dev
The broken version creates fresh objects and fires requests as fast as Python can loop. The fixed version reuses the LLM and adds a small delay so you stop hammering the provider during local testing.
Other Possible Causes
1) Nested agent calls causing hidden fan-out
One task may trigger another task through tools or delegation. In CrewAI, that can multiply requests without looking obvious in your code.
# too much fan-out
crew = Crew(
agents=[researcher, writer],
tasks=[task1, task2, task3],
process=Process.hierarchical,
manager_llm=ChatOpenAI(model="gpt-4o-mini")
)
If you do not need delegation during development, switch to Process.sequential.
crew = Crew(
agents=[researcher, writer],
tasks=[task1],
process=Process.sequential
)
2) Tool loops that keep calling the model
A tool that returns incomplete data can cause the agent to call it again and again. This often happens with web search, scraping, or database lookup tools.
@tool("search_docs")
def search_docs(query: str) -> str:
return "" # empty response makes the agent retry mentally and via tools
Return bounded results and stop conditions.
@tool("search_docs")
def search_docs(query: str) -> str:
return "Top 3 relevant snippets only"
3) No rate limiting around parallel execution
If you run multiple crews with asyncio.gather() or threads, you can exceed provider quotas quickly.
# risky during dev
results = await asyncio.gather(*[crew.kickoff_async() for _ in range(10)])
Throttle concurrency.
sem = asyncio.Semaphore(2)
async def limited_run():
async with sem:
return await crew.kickoff_async()
4) Retry policy is too aggressive
Some SDKs retry on 429 Too Many Requests immediately. That just turns one failure into five failures.
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=10 # too high for local debugging
)
Use smaller retries and exponential backoff if your client supports it.
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=2,
)
How to Debug It
- •
Count actual model calls
- •Add logging around every
crew.kickoff()and every tool invocation. - •If one user action triggers 5–20 LLM calls, you found your multiplier.
- •Add logging around every
- •
Disable delegation first
- •Set
process=Process.sequential. - •Remove manager agents and nested crews until the error disappears.
- •Set
- •
Test with one task and one agent
- •Reduce to a single
Agent, singleTask, singlekickoff(). - •If it still fails, the issue is likely provider limits or retries rather than crew structure.
- •Reduce to a single
- •
Inspect provider response codes
- •Look for
429 Too Many Requests,RateLimitError, or messages like:- •
openai.RateLimitError: Error code: 429 - •
anthropic.RateLimitError: Rate limit exceeded
- •
- •If you see those directly from the SDK, CrewAI is just surfacing the upstream limit.
- •Look for
Prevention
- •Reuse
Agent,Task, and LLM instances instead of rebuilding them inside loops. - •Keep development crews small: one task at a time until behavior is stable.
- •Add throttling for async runs and limit retries on your LLM client.
- •Use cheaper models like
gpt-4o-minifor local iteration before moving to heavier models.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit