CrewAI Tutorial (Python): rate limiting API calls for intermediate developers
This tutorial shows you how to add rate limiting to CrewAI-powered API calls in Python without breaking your agent workflow. You’ll use a small token-bucket limiter around a tool so your crew can keep working while staying under provider limits.
What You'll Need
- •Python 3.10+
- •
crewai - •
requests - •An API key for the service you want to call
- •Basic CrewAI setup:
Agent,Task, andCrew - •A target API endpoint that enforces or benefits from client-side throttling
Install the packages:
pip install crewai requests
Step-by-Step
- •Start with a simple rate limiter that works in-process.
This version is enough for a single Python process running one crew. It uses a token bucket so bursts are allowed, but long-term throughput stays capped.
import time
from threading import Lock
class TokenBucketLimiter:
def __init__(self, rate_per_second: float, capacity: int):
self.rate = rate_per_second
self.capacity = capacity
self.tokens = capacity
self.updated_at = time.monotonic()
self.lock = Lock()
def acquire(self) -> None:
while True:
with self.lock:
now = time.monotonic()
elapsed = now - self.updated_at
self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
self.updated_at = now
if self.tokens >= 1:
self.tokens -= 1
return
wait_time = (1 - self.tokens) / self.rate
time.sleep(wait_time)
- •Wrap your external API call in a CrewAI tool.
CrewAI tools are the cleanest place to enforce limits because every agent call goes through them. This example uses requests and applies the limiter before each outbound HTTP request.
import requests
from crewai.tools import tool
limiter = TokenBucketLimiter(rate_per_second=2, capacity=4)
@tool("fetch_api_data")
def fetch_api_data(url: str) -> str:
"""Fetch JSON or text from an API endpoint with client-side rate limiting."""
limiter.acquire()
response = requests.get(url, timeout=15)
response.raise_for_status()
try:
return response.json().__repr__()
except ValueError:
return response.text
- •Plug the tool into an agent and task.
The agent doesn’t need to know anything about throttling. That’s the point: keep the policy in the tool layer so you can reuse it across crews and workflows.
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="API Researcher",
goal="Collect data from external APIs without exceeding request limits",
backstory="You are careful with third-party APIs and always respect throttling.",
tools=[fetch_api_data],
verbose=True,
)
task = Task(
description="Call https://httpbin.org/get five times and summarize the responses.",
expected_output="A short summary of the fetched responses.",
agent=researcher,
)
- •Add repeated tool usage so you can see the limiter working.
A single task may not hit your limit hard enough to notice it. This helper calls the same tool several times in sequence and prints timestamps so you can see spacing between requests.
def run_burst_test():
urls = ["https://httpbin.org/get"] * 5
results = []
for i, url in enumerate(urls, start=1):
start = time.time()
result = fetch_api_data(url)
elapsed = time.time() - start
print(f"Call {i} took {elapsed:.2f}s")
results.append(result)
return results
if __name__ == "__main__":
run_burst_test()
- •Run the crew as normal.
Once the tool is wrapped, your crew execution stays unchanged. If the agent decides to call the tool multiple times, each call will be throttled before leaving your process.
crew = Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
verbose=True,
)
if __name__ == "__main__":
result = crew.kickoff()
print(result)
Testing It
Run the burst test first and watch the timestamps. With rate_per_second=2 and capacity=4, you should get a small initial burst and then visible pauses once the bucket drains.
If you want a clearer signal, temporarily set rate_per_second=1 and capacity=1. The second through fifth calls should slow down to roughly one request per second.
Also test failure paths by pointing the tool at an endpoint that returns 429 or times out. Your limiter prevents local overuse, but you still need normal HTTP error handling for upstream failures.
Next Steps
- •Move the limiter into a shared utility module so multiple tools can reuse it.
- •Add distributed rate limiting with Redis if you run multiple workers or containers.
- •Combine this with exponential backoff for 429 responses so your agent handles provider-side throttling cleanly.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit