CrewAI Tutorial (Python): rate limiting API calls for beginners

By Cyprian AaronsUpdated 2026-04-21
crewairate-limiting-api-calls-for-beginnerspython

This tutorial shows you how to add rate limiting to CrewAI-powered API calls in Python without breaking your agent flow. You’ll use a small wrapper around an external API so your agents stop hammering endpoints, avoid 429s, and stay within provider limits.

What You'll Need

  • Python 3.10+
  • crewai
  • requests
  • An API key for the service you want to call
  • A basic CrewAI setup with at least one agent and one task
  • Optional: python-dotenv if you want to load secrets from a .env file

Install the packages:

pip install crewai requests python-dotenv

Step-by-Step

  1. First, create a simple rate-limited API client. This example uses a token-bucket style limiter with a minimum delay between requests. It is simple, predictable, and good enough for most beginner use cases.
import os
import time
import threading
import requests


class RateLimitedAPIClient:
    def __init__(self, api_key: str, min_interval_seconds: float = 1.0):
        self.api_key = api_key
        self.min_interval_seconds = min_interval_seconds
        self._lock = threading.Lock()
        self._last_call = 0.0

    def get(self, url: str, params=None):
        with self._lock:
            elapsed = time.time() - self._last_call
            sleep_for = max(0.0, self.min_interval_seconds - elapsed)
            if sleep_for > 0:
                time.sleep(sleep_for)
            self._last_call = time.time()

        headers = {"Authorization": f"Bearer {self.api_key}"}
        response = requests.get(url, headers=headers, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
  1. Next, wire that client into a CrewAI tool. CrewAI tools are the cleanest way to expose controlled API access to an agent, because you can keep the rate limiting logic outside the prompt and inside code.
from crewai import Agent, Crew, Process, Task
from crewai.tools import BaseTool


class WeatherLookupTool(BaseTool):
    name: str = "weather_lookup"
    description: str = "Fetches weather data from a public API with rate limiting."

    def __init__(self, client: RateLimitedAPIClient):
        super().__init__()
        self.client = client

    def _run(self, city: str) -> str:
        url = "https://api.open-meteo.com/v1/forecast"
        data = self.client.get(url, params={
            "latitude": 52.52,
            "longitude": 13.41,
            "current_weather": True,
        })
        return f"Weather data for {city}: {data['current_weather']}"
  1. Now create an agent and task that use the tool. The agent does not need to know anything about rate limiting; it just calls the tool when needed.
api_key = os.getenv("FAKE_API_KEY", "demo-key")
client = RateLimitedAPIClient(api_key=api_key, min_interval_seconds=2.0)
weather_tool = WeatherLookupTool(client)

agent = Agent(
    role="API assistant",
    goal="Fetch external data responsibly",
    backstory="You call APIs through approved tools only.",
    tools=[weather_tool],
    verbose=True,
)

task = Task(
    description="Use the weather_lookup tool to get weather info for Berlin.",
    expected_output="A short summary of the weather data.",
    agent=agent,
)
  1. After that, run the crew and observe that repeated calls are spaced out by your limiter. If you trigger multiple tool calls in quick succession, the wrapper will pause before sending the next request.
crew = Crew(
    agents=[agent],
    tasks=[task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()
print(result)
  1. If you want stronger protection, add retry handling for 429 Too Many Requests. This keeps your workflow stable when the upstream provider has its own burst limits.
from requests import HTTPError


class RetryRateLimitedAPIClient(RateLimitedAPIClient):
    def get(self, url: str, params=None):
        for attempt in range(3):
            try:
                return super().get(url, params=params)
            except HTTPError as e:
                status_code = getattr(e.response, "status_code", None)
                if status_code == 429 and attempt < 2:
                    time.sleep(2 ** attempt)
                    continue
                raise

Testing It

Run the script once and confirm it returns a result instead of failing on request throttling. Then lower min_interval_seconds to something like 0.1 and compare behavior against 2.0; you should see more frequent calls when the delay is smaller.

If your API provider returns rate-limit headers like Retry-After, extend the client to honor them before retrying. For production systems, log each request timestamp so you can prove your limiter is doing what you expect.

Next Steps

  • Add per-endpoint limits instead of one global delay.
  • Move from fixed delays to a true token bucket or leaky bucket implementation.
  • Wrap this pattern around LLM tool calls too, not just REST APIs.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides