LangGraph Tutorial (Python): rate limiting API calls for intermediate developers

By Cyprian AaronsUpdated 2026-04-22

langgraphrate-limiting-api-calls-for-intermediate-developerspython

This tutorial shows how to build a LangGraph workflow in Python that rate-limits outbound API calls before they hit a third-party service. You need this when your graph fans out, retries, or processes batches and you want to avoid 429s, quota burn, and noisy incident pages.

What You'll Need

•Python 3.10+
•langgraph
•langchain-core
•httpx
•An API key for the service you want to call
•Basic familiarity with LangGraph nodes, edges, and state
•A terminal for running the script

Install the packages:

pip install langgraph langchain-core httpx

Step-by-Step

•Start with a simple graph state that tracks the work item, the API response, and the last time a request was sent. The rate limiter will use this state to decide whether to wait before making the next call.

from typing import TypedDict, Optional
import time

from langgraph.graph import StateGraph, START, END


class GraphState(TypedDict):
    prompt: str
    response: Optional[str]
    last_call_ts: float

•Add a rate-limit helper that enforces a minimum interval between calls. This is the simplest reliable pattern for intermediate workflows: store the timestamp in state, sleep if needed, then update the timestamp after the request.

def enforce_rate_limit(state: GraphState) -> None:
    min_interval_seconds = 2.0
    now = time.time()
    elapsed = now - state["last_call_ts"]

    if elapsed < min_interval_seconds:
        time.sleep(min_interval_seconds - elapsed)

•Build the node that performs the actual API call. For this tutorial, I’m using httpx against https://httpbin.org/post so the code runs as-is without needing a real vendor key; swap this out for your own API endpoint later.

import httpx


def call_api(state: GraphState) -> GraphState:
    enforce_rate_limit(state)

    payload = {"prompt": state["prompt"]}
    with httpx.Client(timeout=10.0) as client:
        r = client.post("https://httpbin.org/post", json=payload)
        r.raise_for_status()

    return {
        "prompt": state["prompt"],
        "response": r.json()["json"]["prompt"],
        "last_call_ts": time.time(),
    }

•Wire the node into a LangGraph workflow. This graph is intentionally small: one node does the throttled request, then we end.

workflow = StateGraph(GraphState)

workflow.add_node("call_api", call_api)
workflow.add_edge(START, "call_api")
workflow.add_edge("call_api", END)

app = workflow.compile()

•Run it twice in a row to see the limiter working. The second invocation should pause if it happens too soon after the first one.

if __name__ == "__main__":
    initial_state: GraphState = {
        "prompt": "hello from langgraph",
        "response": None,
        "last_call_ts": 0.0,
    }

    result1 = app.invoke(initial_state)
    print("First:", result1["response"])

    result2 = app.invoke(result1)
    print("Second:", result2["response"])

•If you want to use this pattern in a real agent, move the limiter into its own node or wrap it around every external-service node. That keeps your graph readable and lets you reuse one policy across multiple APIs.

def rate_limited_node(state: GraphState) -> GraphState:
    enforce_rate_limit(state)
    return call_api(state)


workflow = StateGraph(GraphState)
workflow.add_node("rate_limited_call", rate_limited_node)
workflow.add_edge(START, "rate_limited_call")
workflow.add_edge("rate_limited_call", END)
app = workflow.compile()

Testing It

Run the script and watch how long it takes between the first and second calls. With a 2-second interval configured, two back-to-back invocations should take at least about 2 seconds of spacing between outbound requests.

To verify correctness under load, loop over several inputs and confirm you never get HTTP 429s from your target service. If you do see throttling errors, your minimum interval is too low for that provider’s policy or your graph has parallel paths making extra requests.

If you’re integrating with a real vendor API, log timestamps before and after each node execution. That gives you proof that throttling is happening where you expect it to happen.

Next Steps

•Add exponential backoff for transient failures like 429s and 503s.
•Replace fixed-delay throttling with token-bucket or leaky-bucket logic for higher throughput.
•Move request accounting into shared persistent storage if multiple workers run the same graph concurrently

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit