LangGraph Tutorial (Python): rate limiting API calls for beginners
This tutorial shows you how to build a LangGraph workflow that rate limits outbound API calls in Python. You’d use this when your agent can trigger too many requests too quickly and you need to stay under vendor quotas, avoid 429s, and keep costs predictable.
What You'll Need
- •Python 3.10+
- •
langgraph - •
langchain-core - •
httpx - •An API key for the service you want to call
- •Basic familiarity with LangGraph
StateGraph, nodes, and edges
Install the packages:
pip install langgraph langchain-core httpx
Step-by-Step
- •Start by defining a simple state object and a rate limiter.
For beginners, a token bucket is the easiest pattern to reason about: each request consumes one token, and tokens refill over time.
import time
from typing import TypedDict
class GraphState(TypedDict):
url: str
response_text: str
allowed: bool
retry_after: float
class TokenBucket:
def __init__(self, capacity: int, refill_per_second: float):
self.capacity = capacity
self.refill_per_second = refill_per_second
self.tokens = capacity
self.updated_at = time.monotonic()
def allow(self) -> tuple[bool, float]:
now = time.monotonic()
elapsed = now - self.updated_at
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_per_second)
self.updated_at = now
if self.tokens >= 1:
self.tokens -= 1
return True, 0.0
missing = 1 - self.tokens
wait_time = missing / self.refill_per_second
return False, wait_time
rate_limiter = TokenBucket(capacity=3, refill_per_second=1.0)
- •Add a node that checks the limit before making the API call.
If the request is allowed, the graph continues to the HTTP node. If not, it returns a retry delay and stops cleanly.
def check_rate_limit(state: GraphState) -> GraphState:
allowed, retry_after = rate_limiter.allow()
return {
**state,
"allowed": allowed,
"retry_after": retry_after,
}
- •Add the actual HTTP request node.
This example useshttpxso you can run it against any public endpoint or your own API.
import httpx
def call_api(state: GraphState) -> GraphState:
if not state["allowed"]:
return {**state, "response_text": f"Rate limited. Retry after {state['retry_after']:.2f}s"}
response = httpx.get(state["url"], timeout=10.0)
response.raise_for_status()
return {
**state,
"response_text": response.text[:200],
}
- •Wire the nodes into a LangGraph workflow.
This graph is intentionally small: one node gates traffic, one node performs the call.
from langgraph.graph import StateGraph, START, END
builder = StateGraph(GraphState)
builder.add_node("check_rate_limit", check_rate_limit)
builder.add_node("call_api", call_api)
builder.add_edge(START, "check_rate_limit")
builder.add_conditional_edges(
"check_rate_limit",
lambda state: "call_api" if state["allowed"] else END,
)
builder.add_edge("call_api", END)
graph = builder.compile()
- •Run it multiple times to see the limiter in action.
The first few requests should pass immediately. After that, you’ll see requests get blocked until tokens refill.
if __name__ == "__main__":
initial_state: GraphState = {
"url": "https://httpbin.org/get",
"response_text": "",
"allowed": False,
"retry_after": 0.0,
}
for i in range(6):
result = graph.invoke(initial_state)
print(f"Run {i+1}: allowed={result['allowed']}, retry_after={result['retry_after']:.2f}")
print(result["response_text"])
time.sleep(0.2)
Testing It
Run the script and watch the first three calls succeed because the bucket starts full. Then keep invoking it quickly; once tokens are exhausted, allowed should become False and retry_after should show how long to wait.
If you want a stricter test, lower capacity to 1 and set refill_per_second=0.5. That makes rate limiting obvious because only one request is allowed every two seconds.
For production validation, point url at a real internal or third-party endpoint with known quotas and log both allowed and retry_after. That gives you evidence that your graph is protecting downstream services instead of just printing messages.
Next Steps
- •Add persistence with a shared store so rate limits work across multiple processes or workers.
- •Replace the simple token bucket with per-user or per-tenant limits keyed off request metadata.
- •Combine this with retries and exponential backoff so your agent handles 429s gracefully instead of failing hard
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit