How to Fix 'rate limit exceeded' in CrewAI (Python)
What the error means
rate limit exceeded in CrewAI usually means one of the upstream APIs your agents are calling is rejecting requests because you’re sending too many tokens, too many requests, or both. In practice, this shows up when a crew runs multiple agents/tasks quickly, retries aggressively, or uses a model with tight provider limits.
You’ll often see it during task execution, especially with Agent, Task, Crew, and LLM configured against OpenAI, Anthropic, or another hosted provider.
The Most Common Cause
The #1 cause is too much parallel or repeated LLM usage without throttling. In CrewAI, it’s easy to create a crew that fans out several agents at once, each making multiple calls per task. If you also have retries enabled upstream, you hit the provider limit fast.
Here’s the broken pattern:
# broken.py
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM
llm = LLM(model="gpt-4o")
researcher = Agent(
role="Researcher",
goal="Research the topic",
backstory="You are good at finding facts.",
llm=llm,
)
writer = Agent(
role="Writer",
goal="Write a report",
backstory="You write clean summaries.",
llm=llm,
)
tasks = [
Task(description="Research customer complaints for product A", agent=researcher),
Task(description="Summarize findings into a report", agent=writer),
]
crew = Crew(
agents=[researcher, writer],
tasks=tasks,
process=Process.parallel, # bad if your provider limit is low
)
result = crew.kickoff()
print(result)
And here’s the fixed pattern:
# fixed.py
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM
llm = LLM(model="gpt-4o")
researcher = Agent(
role="Researcher",
goal="Research the topic",
backstory="You are good at finding facts.",
llm=llm,
)
writer = Agent(
role="Writer",
goal="Write a report",
backstory="You write clean summaries.",
llm=llm,
)
research_task = Task(
description="Research customer complaints for product A",
agent=researcher,
)
write_task = Task(
description="Summarize findings into a report using the research output",
agent=writer,
context=[research_task],
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
process=Process.sequential,
)
result = crew.kickoff()
print(result)
The key change is simple:
- •avoid parallel execution unless you’ve measured capacity
- •chain tasks with
context - •reduce duplicate calls by reusing outputs instead of re-querying the model
If you’re seeing errors like:
- •
openai.RateLimitError: Error code: 429 - •
anthropic.RateLimitError: rate_limit_error - •
litellm.RateLimitError: Rate limit exceeded
then this is usually the first place to look.
Other Possible Causes
1) Your model choice has lower limits than you expect
Some models have much tighter request-per-minute and token-per-minute caps.
# risky
llm = LLM(model="gpt-4o-mini") # may still rate-limit under bursty workloads
If your crew is chatty or recursive, switch to a higher-capacity tier or reduce call volume.
# safer
llm = LLM(model="gpt-4o")
2) You are retrying too aggressively
Retries can multiply traffic if every task fails fast and immediately retries.
# example of hidden retry pressure from your wrapper/client config
import os
os.environ["OPENAI_MAX_RETRIES"] = "5"
If you control retries in your app, keep them conservative and add backoff.
import time
for attempt in range(3):
try:
result = crew.kickoff()
break
except Exception as e:
if "RateLimitError" in str(type(e)):
time.sleep(2 ** attempt)
else:
raise
3) Multiple crews share the same API key
If you run workers, cron jobs, or concurrent scripts with one key, they all count against the same quota.
# two processes using the same key at the same time
os.environ["OPENAI_API_KEY"] = "sk-..."
Fix it by serializing jobs or splitting traffic across environments/keys where policy allows.
4) Your prompts are too large
Large prompts burn through token limits quickly. In CrewAI this happens when you pass huge context blobs into tasks.
task = Task(
description=f"Analyze this document: {very_large_text_blob}",
agent=analyst,
)
Trim input before sending it to an agent:
task = Task(
description=f"Analyze this excerpt: {very_large_text_blob[:4000]}",
agent=analyst,
)
How to Debug It
- •
Check which provider threw the error
- •Look at the exception type and stack trace.
- •If you see
openai.RateLimitError,anthropic.RateLimitError, orlitellm.RateLimitError, CrewAI is not the root cause; the model provider is.
- •
Turn off parallelism
- •Switch
Process.paralleltoProcess.sequential. - •If the error disappears, your workload is bursting past quota.
- •Switch
- •
Log task count and prompt size
- •Print every task description length and context size.
- •Large context often looks harmless until it multiplies across agents.
for t in tasks:
print(len(t.description))
- •Run one agent/task at a time
- •Execute only one
Taskwith oneAgent. - •If that works but full crews fail, you’ve confirmed concurrency or cumulative token usage as the problem.
- •Execute only one
Prevention
- •Use
Process.sequentialby default unless you’ve measured provider headroom. - •Keep task prompts short and pass outputs through
contextinstead of re-sending raw source text. - •Add exponential backoff for transient
429errors instead of hammering the API with immediate retries. - •Monitor request rate and token usage per crew run so you can spot bursts before they hit production.
If you’re building production agents for regulated environments like banking or insurance, treat rate limits as capacity planning work, not just an exception handler problem. The fix is usually in execution design: fewer redundant calls, smaller prompts, and controlled concurrency.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit