CrewAI Tutorial (Python): rate limiting API calls for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewairate-limiting-api-calls-for-intermediate-developerspython

This tutorial shows you how to add rate limiting to CrewAI-powered API calls in Python without breaking your agent workflow. You’ll use a small token-bucket limiter around a tool so your crew can keep working while staying under provider limits.

What You'll Need

  • Python 3.10+
  • crewai
  • requests
  • An API key for the service you want to call
  • Basic CrewAI setup: Agent, Task, and Crew
  • A target API endpoint that enforces or benefits from client-side throttling

Install the packages:

pip install crewai requests

Step-by-Step

  1. Start with a simple rate limiter that works in-process.

This version is enough for a single Python process running one crew. It uses a token bucket so bursts are allowed, but long-term throughput stays capped.

import time
from threading import Lock

class TokenBucketLimiter:
    def __init__(self, rate_per_second: float, capacity: int):
        self.rate = rate_per_second
        self.capacity = capacity
        self.tokens = capacity
        self.updated_at = time.monotonic()
        self.lock = Lock()

    def acquire(self) -> None:
        while True:
            with self.lock:
                now = time.monotonic()
                elapsed = now - self.updated_at
                self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
                self.updated_at = now

                if self.tokens >= 1:
                    self.tokens -= 1
                    return

                wait_time = (1 - self.tokens) / self.rate

            time.sleep(wait_time)
  1. Wrap your external API call in a CrewAI tool.

CrewAI tools are the cleanest place to enforce limits because every agent call goes through them. This example uses requests and applies the limiter before each outbound HTTP request.

import requests
from crewai.tools import tool

limiter = TokenBucketLimiter(rate_per_second=2, capacity=4)

@tool("fetch_api_data")
def fetch_api_data(url: str) -> str:
    """Fetch JSON or text from an API endpoint with client-side rate limiting."""
    limiter.acquire()
    response = requests.get(url, timeout=15)
    response.raise_for_status()

    try:
        return response.json().__repr__()
    except ValueError:
        return response.text
  1. Plug the tool into an agent and task.

The agent doesn’t need to know anything about throttling. That’s the point: keep the policy in the tool layer so you can reuse it across crews and workflows.

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="API Researcher",
    goal="Collect data from external APIs without exceeding request limits",
    backstory="You are careful with third-party APIs and always respect throttling.",
    tools=[fetch_api_data],
    verbose=True,
)

task = Task(
    description="Call https://httpbin.org/get five times and summarize the responses.",
    expected_output="A short summary of the fetched responses.",
    agent=researcher,
)
  1. Add repeated tool usage so you can see the limiter working.

A single task may not hit your limit hard enough to notice it. This helper calls the same tool several times in sequence and prints timestamps so you can see spacing between requests.

def run_burst_test():
    urls = ["https://httpbin.org/get"] * 5
    results = []

    for i, url in enumerate(urls, start=1):
        start = time.time()
        result = fetch_api_data(url)
        elapsed = time.time() - start
        print(f"Call {i} took {elapsed:.2f}s")
        results.append(result)

    return results

if __name__ == "__main__":
    run_burst_test()
  1. Run the crew as normal.

Once the tool is wrapped, your crew execution stays unchanged. If the agent decides to call the tool multiple times, each call will be throttled before leaving your process.

crew = Crew(
    agents=[researcher],
    tasks=[task],
    process=Process.sequential,
    verbose=True,
)

if __name__ == "__main__":
    result = crew.kickoff()
    print(result)

Testing It

Run the burst test first and watch the timestamps. With rate_per_second=2 and capacity=4, you should get a small initial burst and then visible pauses once the bucket drains.

If you want a clearer signal, temporarily set rate_per_second=1 and capacity=1. The second through fifth calls should slow down to roughly one request per second.

Also test failure paths by pointing the tool at an endpoint that returns 429 or times out. Your limiter prevents local overuse, but you still need normal HTTP error handling for upstream failures.

Next Steps

  • Move the limiter into a shared utility module so multiple tools can reuse it.
  • Add distributed rate limiting with Redis if you run multiple workers or containers.
  • Combine this with exponential backoff for 429 responses so your agent handles provider-side throttling cleanly.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides