CrewAI Tutorial (Python): rate limiting API calls for beginners
This tutorial shows you how to add rate limiting to CrewAI-powered API calls in Python without breaking your agent flow. You’ll use a small wrapper around an external API so your agents stop hammering endpoints, avoid 429s, and stay within provider limits.
What You'll Need
- •Python 3.10+
- •
crewai - •
requests - •An API key for the service you want to call
- •A basic CrewAI setup with at least one agent and one task
- •Optional:
python-dotenvif you want to load secrets from a.envfile
Install the packages:
pip install crewai requests python-dotenv
Step-by-Step
- •First, create a simple rate-limited API client. This example uses a token-bucket style limiter with a minimum delay between requests. It is simple, predictable, and good enough for most beginner use cases.
import os
import time
import threading
import requests
class RateLimitedAPIClient:
def __init__(self, api_key: str, min_interval_seconds: float = 1.0):
self.api_key = api_key
self.min_interval_seconds = min_interval_seconds
self._lock = threading.Lock()
self._last_call = 0.0
def get(self, url: str, params=None):
with self._lock:
elapsed = time.time() - self._last_call
sleep_for = max(0.0, self.min_interval_seconds - elapsed)
if sleep_for > 0:
time.sleep(sleep_for)
self._last_call = time.time()
headers = {"Authorization": f"Bearer {self.api_key}"}
response = requests.get(url, headers=headers, params=params, timeout=30)
response.raise_for_status()
return response.json()
- •Next, wire that client into a CrewAI tool. CrewAI tools are the cleanest way to expose controlled API access to an agent, because you can keep the rate limiting logic outside the prompt and inside code.
from crewai import Agent, Crew, Process, Task
from crewai.tools import BaseTool
class WeatherLookupTool(BaseTool):
name: str = "weather_lookup"
description: str = "Fetches weather data from a public API with rate limiting."
def __init__(self, client: RateLimitedAPIClient):
super().__init__()
self.client = client
def _run(self, city: str) -> str:
url = "https://api.open-meteo.com/v1/forecast"
data = self.client.get(url, params={
"latitude": 52.52,
"longitude": 13.41,
"current_weather": True,
})
return f"Weather data for {city}: {data['current_weather']}"
- •Now create an agent and task that use the tool. The agent does not need to know anything about rate limiting; it just calls the tool when needed.
api_key = os.getenv("FAKE_API_KEY", "demo-key")
client = RateLimitedAPIClient(api_key=api_key, min_interval_seconds=2.0)
weather_tool = WeatherLookupTool(client)
agent = Agent(
role="API assistant",
goal="Fetch external data responsibly",
backstory="You call APIs through approved tools only.",
tools=[weather_tool],
verbose=True,
)
task = Task(
description="Use the weather_lookup tool to get weather info for Berlin.",
expected_output="A short summary of the weather data.",
agent=agent,
)
- •After that, run the crew and observe that repeated calls are spaced out by your limiter. If you trigger multiple tool calls in quick succession, the wrapper will pause before sending the next request.
crew = Crew(
agents=[agent],
tasks=[task],
process=Process.sequential,
verbose=True,
)
result = crew.kickoff()
print(result)
- •If you want stronger protection, add retry handling for
429 Too Many Requests. This keeps your workflow stable when the upstream provider has its own burst limits.
from requests import HTTPError
class RetryRateLimitedAPIClient(RateLimitedAPIClient):
def get(self, url: str, params=None):
for attempt in range(3):
try:
return super().get(url, params=params)
except HTTPError as e:
status_code = getattr(e.response, "status_code", None)
if status_code == 429 and attempt < 2:
time.sleep(2 ** attempt)
continue
raise
Testing It
Run the script once and confirm it returns a result instead of failing on request throttling. Then lower min_interval_seconds to something like 0.1 and compare behavior against 2.0; you should see more frequent calls when the delay is smaller.
If your API provider returns rate-limit headers like Retry-After, extend the client to honor them before retrying. For production systems, log each request timestamp so you can prove your limiter is doing what you expect.
Next Steps
- •Add per-endpoint limits instead of one global delay.
- •Move from fixed delays to a true token bucket or leaky bucket implementation.
- •Wrap this pattern around LLM tool calls too, not just REST APIs.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit