LangGraph Tutorial (Python): rate limiting API calls for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphrate-limiting-api-calls-for-beginnerspython

This tutorial shows you how to build a LangGraph workflow that rate limits outbound API calls in Python. You’d use this when your agent can trigger too many requests too quickly and you need to stay under vendor quotas, avoid 429s, and keep costs predictable.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-core
  • httpx
  • An API key for the service you want to call
  • Basic familiarity with LangGraph StateGraph, nodes, and edges

Install the packages:

pip install langgraph langchain-core httpx

Step-by-Step

  1. Start by defining a simple state object and a rate limiter.
    For beginners, a token bucket is the easiest pattern to reason about: each request consumes one token, and tokens refill over time.
import time
from typing import TypedDict

class GraphState(TypedDict):
    url: str
    response_text: str
    allowed: bool
    retry_after: float

class TokenBucket:
    def __init__(self, capacity: int, refill_per_second: float):
        self.capacity = capacity
        self.refill_per_second = refill_per_second
        self.tokens = capacity
        self.updated_at = time.monotonic()

    def allow(self) -> tuple[bool, float]:
        now = time.monotonic()
        elapsed = now - self.updated_at
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_per_second)
        self.updated_at = now

        if self.tokens >= 1:
            self.tokens -= 1
            return True, 0.0

        missing = 1 - self.tokens
        wait_time = missing / self.refill_per_second
        return False, wait_time

rate_limiter = TokenBucket(capacity=3, refill_per_second=1.0)
  1. Add a node that checks the limit before making the API call.
    If the request is allowed, the graph continues to the HTTP node. If not, it returns a retry delay and stops cleanly.
def check_rate_limit(state: GraphState) -> GraphState:
    allowed, retry_after = rate_limiter.allow()
    return {
        **state,
        "allowed": allowed,
        "retry_after": retry_after,
    }
  1. Add the actual HTTP request node.
    This example uses httpx so you can run it against any public endpoint or your own API.
import httpx

def call_api(state: GraphState) -> GraphState:
    if not state["allowed"]:
        return {**state, "response_text": f"Rate limited. Retry after {state['retry_after']:.2f}s"}

    response = httpx.get(state["url"], timeout=10.0)
    response.raise_for_status()
    return {
        **state,
        "response_text": response.text[:200],
    }
  1. Wire the nodes into a LangGraph workflow.
    This graph is intentionally small: one node gates traffic, one node performs the call.
from langgraph.graph import StateGraph, START, END

builder = StateGraph(GraphState)

builder.add_node("check_rate_limit", check_rate_limit)
builder.add_node("call_api", call_api)

builder.add_edge(START, "check_rate_limit")
builder.add_conditional_edges(
    "check_rate_limit",
    lambda state: "call_api" if state["allowed"] else END,
)
builder.add_edge("call_api", END)

graph = builder.compile()
  1. Run it multiple times to see the limiter in action.
    The first few requests should pass immediately. After that, you’ll see requests get blocked until tokens refill.
if __name__ == "__main__":
    initial_state: GraphState = {
        "url": "https://httpbin.org/get",
        "response_text": "",
        "allowed": False,
        "retry_after": 0.0,
    }

    for i in range(6):
        result = graph.invoke(initial_state)
        print(f"Run {i+1}: allowed={result['allowed']}, retry_after={result['retry_after']:.2f}")
        print(result["response_text"])
        time.sleep(0.2)

Testing It

Run the script and watch the first three calls succeed because the bucket starts full. Then keep invoking it quickly; once tokens are exhausted, allowed should become False and retry_after should show how long to wait.

If you want a stricter test, lower capacity to 1 and set refill_per_second=0.5. That makes rate limiting obvious because only one request is allowed every two seconds.

For production validation, point url at a real internal or third-party endpoint with known quotas and log both allowed and retry_after. That gives you evidence that your graph is protecting downstream services instead of just printing messages.

Next Steps

  • Add persistence with a shared store so rate limits work across multiple processes or workers.
  • Replace the simple token bucket with per-user or per-tenant limits keyed off request metadata.
  • Combine this with retries and exponential backoff so your agent handles 429s gracefully instead of failing hard

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides