How to Fix 'rate limit exceeded' in CrewAI (Python)

By Cyprian AaronsUpdated 2026-04-21
rate-limit-exceededcrewaipython

What the error means

rate limit exceeded in CrewAI usually means one of the upstream APIs your agents are calling is rejecting requests because you’re sending too many tokens, too many requests, or both. In practice, this shows up when a crew runs multiple agents/tasks quickly, retries aggressively, or uses a model with tight provider limits.

You’ll often see it during task execution, especially with Agent, Task, Crew, and LLM configured against OpenAI, Anthropic, or another hosted provider.

The Most Common Cause

The #1 cause is too much parallel or repeated LLM usage without throttling. In CrewAI, it’s easy to create a crew that fans out several agents at once, each making multiple calls per task. If you also have retries enabled upstream, you hit the provider limit fast.

Here’s the broken pattern:

# broken.py
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

llm = LLM(model="gpt-4o")

researcher = Agent(
    role="Researcher",
    goal="Research the topic",
    backstory="You are good at finding facts.",
    llm=llm,
)

writer = Agent(
    role="Writer",
    goal="Write a report",
    backstory="You write clean summaries.",
    llm=llm,
)

tasks = [
    Task(description="Research customer complaints for product A", agent=researcher),
    Task(description="Summarize findings into a report", agent=writer),
]

crew = Crew(
    agents=[researcher, writer],
    tasks=tasks,
    process=Process.parallel,  # bad if your provider limit is low
)

result = crew.kickoff()
print(result)

And here’s the fixed pattern:

# fixed.py
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

llm = LLM(model="gpt-4o")

researcher = Agent(
    role="Researcher",
    goal="Research the topic",
    backstory="You are good at finding facts.",
    llm=llm,
)

writer = Agent(
    role="Writer",
    goal="Write a report",
    backstory="You write clean summaries.",
    llm=llm,
)

research_task = Task(
    description="Research customer complaints for product A",
    agent=researcher,
)

write_task = Task(
    description="Summarize findings into a report using the research output",
    agent=writer,
    context=[research_task],
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,
)

result = crew.kickoff()
print(result)

The key change is simple:

  • avoid parallel execution unless you’ve measured capacity
  • chain tasks with context
  • reduce duplicate calls by reusing outputs instead of re-querying the model

If you’re seeing errors like:

  • openai.RateLimitError: Error code: 429
  • anthropic.RateLimitError: rate_limit_error
  • litellm.RateLimitError: Rate limit exceeded

then this is usually the first place to look.

Other Possible Causes

1) Your model choice has lower limits than you expect

Some models have much tighter request-per-minute and token-per-minute caps.

# risky
llm = LLM(model="gpt-4o-mini")  # may still rate-limit under bursty workloads

If your crew is chatty or recursive, switch to a higher-capacity tier or reduce call volume.

# safer
llm = LLM(model="gpt-4o")

2) You are retrying too aggressively

Retries can multiply traffic if every task fails fast and immediately retries.

# example of hidden retry pressure from your wrapper/client config
import os
os.environ["OPENAI_MAX_RETRIES"] = "5"

If you control retries in your app, keep them conservative and add backoff.

import time

for attempt in range(3):
    try:
        result = crew.kickoff()
        break
    except Exception as e:
        if "RateLimitError" in str(type(e)):
            time.sleep(2 ** attempt)
        else:
            raise

3) Multiple crews share the same API key

If you run workers, cron jobs, or concurrent scripts with one key, they all count against the same quota.

# two processes using the same key at the same time
os.environ["OPENAI_API_KEY"] = "sk-..."

Fix it by serializing jobs or splitting traffic across environments/keys where policy allows.

4) Your prompts are too large

Large prompts burn through token limits quickly. In CrewAI this happens when you pass huge context blobs into tasks.

task = Task(
    description=f"Analyze this document: {very_large_text_blob}",
    agent=analyst,
)

Trim input before sending it to an agent:

task = Task(
    description=f"Analyze this excerpt: {very_large_text_blob[:4000]}",
    agent=analyst,
)

How to Debug It

  1. Check which provider threw the error

    • Look at the exception type and stack trace.
    • If you see openai.RateLimitError, anthropic.RateLimitError, or litellm.RateLimitError, CrewAI is not the root cause; the model provider is.
  2. Turn off parallelism

    • Switch Process.parallel to Process.sequential.
    • If the error disappears, your workload is bursting past quota.
  3. Log task count and prompt size

    • Print every task description length and context size.
    • Large context often looks harmless until it multiplies across agents.
for t in tasks:
    print(len(t.description))
  1. Run one agent/task at a time
    • Execute only one Task with one Agent.
    • If that works but full crews fail, you’ve confirmed concurrency or cumulative token usage as the problem.

Prevention

  • Use Process.sequential by default unless you’ve measured provider headroom.
  • Keep task prompts short and pass outputs through context instead of re-sending raw source text.
  • Add exponential backoff for transient 429 errors instead of hammering the API with immediate retries.
  • Monitor request rate and token usage per crew run so you can spot bursts before they hit production.

If you’re building production agents for regulated environments like banking or insurance, treat rate limits as capacity planning work, not just an exception handler problem. The fix is usually in execution design: fewer redundant calls, smaller prompts, and controlled concurrency.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides