CrewAI Tutorial (Python): rate limiting API calls for advanced developers
This tutorial shows how to add rate limiting around CrewAI-driven API calls in Python without breaking your agent flow. You need this when your agents hit third-party APIs with strict quotas, burst limits, or per-minute billing caps.
What You'll Need
- •Python 3.10+
- •
crewai - •
requests - •
python-dotenv - •An API key for the service you’re calling
- •A CrewAI project with at least one agent and one task
- •Basic familiarity with
Crew,Agent, andTask
Step-by-Step
- •Start by installing the packages and defining a small rate limiter. This example uses a token-bucket style limiter that blocks until a request slot is available.
pip install crewai requests python-dotenv
import time
import threading
class RateLimiter:
def __init__(self, max_calls: int, period_seconds: int):
self.max_calls = max_calls
self.period_seconds = period_seconds
self.calls = []
self.lock = threading.Lock()
def acquire(self):
while True:
with self.lock:
now = time.time()
self.calls = [t for t in self.calls if now - t < self.period_seconds]
if len(self.calls) < self.max_calls:
self.calls.append(now)
return
sleep_for = self.period_seconds - (now - self.calls[0])
time.sleep(max(0.01, sleep_for))
- •Next, wrap the actual HTTP call in a function that uses the limiter before every request. In production, this is the layer you want between your CrewAI task output and the external API.
import os
import requests
from dotenv import load_dotenv
load_dotenv()
API_URL = "https://api.example.com/v1/data"
API_KEY = os.getenv("EXAMPLE_API_KEY")
limiter = RateLimiter(max_calls=5, period_seconds=60)
def fetch_data(query: str) -> dict:
limiter.acquire()
response = requests.get(
API_URL,
headers={"Authorization": f"Bearer {API_KEY}"},
params={"q": query},
timeout=30,
)
response.raise_for_status()
return response.json()
- •Now wire that function into a CrewAI agent through a tool. CrewAI will call the tool, and your limiter will control how often those outbound API requests happen.
from crewai import Agent, Task, Crew, Process, tool
@tool("fetch_data")
def fetch_data_tool(query: str) -> str:
data = fetch_data(query)
return str(data)
agent = Agent(
role="Data Fetcher",
goal="Retrieve external data without exceeding API limits",
backstory="You manage API access carefully and respect quota constraints.",
tools=[fetch_data_tool],
verbose=True,
)
task = Task(
description="Fetch data for 'insurance claims' using the available tool.",
expected_output="A JSON-like string with the fetched result.",
agent=agent,
)
crew = Crew(
agents=[agent],
tasks=[task],
process=Process.sequential,
)
- •If you need multiple APIs or different quotas per endpoint, create separate limiters and route calls accordingly. That keeps one noisy integration from starving another.
search_limiter = RateLimiter(max_calls=10, period_seconds=60)
billing_limiter = RateLimiter(max_calls=2, period_seconds=60)
def search_api(query: str) -> dict:
search_limiter.acquire()
response = requests.get(
"https://api.example.com/v1/search",
headers={"Authorization": f"Bearer {API_KEY}"},
params={"q": query},
timeout=30,
)
response.raise_for_status()
return response.json()
def billing_api(account_id: str) -> dict:
billing_limiter.acquire()
response = requests.get(
"https://api.example.com/v1/billing",
headers={"Authorization": f"Bearer {API_KEY}"},
params={"account_id": account_id},
timeout=30,
)
response.raise_for_status()
return response.json()
- •Finally, run the crew and inspect the result. In real systems, you’d usually log each limiter wait so you can see when traffic is being throttled.
if __name__ == "__main__":
result = crew.kickoff()
print(result)
Testing It
Run the script with a low limit like max_calls=2 and trigger several tool invocations in quick succession. You should see later calls pause instead of failing immediately with HTTP 429s.
If your upstream API returns rate-limit headers like Retry-After, add logging around limiter.acquire() so you can confirm your wait times match expected quotas. Also verify that your CrewAI task still completes successfully after waiting.
For a stronger test, point the tool at a mock server that counts requests and returns 429 after a threshold. Your limiter should prevent those failures entirely once it is tuned correctly.
Next Steps
- •Add exponential backoff on top of rate limiting for transient 429s and 5xx responses.
- •Move limiter state into Redis if you run multiple workers or multiple agent processes.
- •Add structured logs and metrics so you can track request volume per tool and per tenant.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit