How to Fix 'rate limit exceeded' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
rate-limit-exceededlanggraphpython

What the error means

rate limit exceeded in LangGraph usually means one of the underlying model calls hit a provider quota, not that LangGraph itself is broken. You’ll see it when your graph makes too many LLM requests too quickly, or when retries, loops, or concurrent runs multiply the number of calls.

The exact exception often comes from the provider SDK underneath LangGraph, for example openai.RateLimitError, anthropic.RateLimitError, or a generic HTTPStatusError: 429. LangGraph just surfaces it while executing a node.

The Most Common Cause

The #1 cause is an accidental loop or repeated node execution that keeps calling the model on every step. In LangGraph, this usually happens when your conditional edge keeps routing back to the same LLM node without a hard stop.

Here’s the broken pattern:

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

class State(TypedDict):
    messages: list

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def call_model(state: State):
    response = llm.invoke(state["messages"])
    return {"messages": state["messages"] + [response]}

def should_continue(state: State):
    # Bug: always returns "continue"
    return "continue"

graph = StateGraph(State)
graph.add_node("agent", call_model)
graph.add_conditional_edges("agent", should_continue, {
    "continue": "agent",
    "end": END,
})
graph.set_entry_point("agent")
app = graph.compile()

And here’s the fixed version:

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

class State(TypedDict):
    messages: list
    iterations: int

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def call_model(state: State):
    response = llm.invoke(state["messages"])
    return {
        "messages": state["messages"] + [response],
        "iterations": state.get("iterations", 0) + 1,
    }

def should_continue(state: State):
    if state["iterations"] >= 3:
        return "end"
    return "continue"

graph = StateGraph(State)
graph.add_node("agent", call_model)
graph.add_conditional_edges("agent", should_continue, {
    "continue": "agent",
    "end": END,
})
graph.set_entry_point("agent")
app = graph.compile()

The key difference is simple: you need an explicit termination condition. Without it, one user request can become 10, 20, or 100 model calls and trip provider limits fast.

Other Possible Causes

1. Too much concurrency

If you run many graph executions at once, each node may call the model independently. That can blow through per-minute request limits even if each individual graph is correct.

# Problematic: firing many runs at once
results = app.batch(inputs)   # may trigger burst traffic

Use throttling or lower concurrency:

results = app.batch(inputs, config={"max_concurrency": 2})

2. Retries multiplying requests

LangChain and provider clients often retry failed calls automatically. If your graph already retries at the node level, you can end up with duplicate retry layers.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=6,   # can amplify traffic under load
)

Try lowering retries and handling failures at one layer only:

llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=2,
)

3. Replaying full message history on every node

If each node sends the entire conversation back to the model, token usage and request volume both rise quickly. This gets expensive fast in multi-node graphs.

def call_model(state):
    return llm.invoke(state["messages"])  # entire history every time

Trim context before invoking:

def call_model(state):
    recent_messages = state["messages"][-6:]
    return llm.invoke(recent_messages)

4. Provider-side quota or tier limits

Sometimes the code is fine and your API key just hit its plan limit. The error may look like this:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit exceeded'}}

Or for Anthropic:

anthropic.RateLimitError: Error code: 429 - {'type': 'rate_limit_error'}

In that case, check your dashboard for:

  • requests per minute
  • tokens per minute
  • daily spend caps
  • org-level quotas

How to Debug It

  1. Count how many times each node runs Add logging inside every node:

    def call_model(state):
        print(f"agent called with iterations={state.get('iterations', 0)}")
        ...
    

    If one input triggers multiple unexpected calls, you have a graph flow problem.

  2. Inspect the stack trace If you see openai.RateLimitError, anthropic.RateLimitError, or HTTPStatusError: 429, the provider rejected the request. If you see repeated LangGraph node execution before that error, your graph logic is probably looping.

  3. Disable concurrency temporarily Run one request at a time:

    result = app.invoke(input_state)
    

    If the error disappears, your issue is burst traffic or batch fan-out.

  4. Reduce retries and context Set max_retries=0 or 1 and trim message history. If failures stop after this change, you were amplifying requests through retries or oversized prompts.

Prevention

  • Put a hard stop in every looped LangGraph flow.
    • Use counters like iterations, tool_calls, or depth.
  • Limit concurrency on batch jobs.
    • Start with max_concurrency=1 or 2 and scale up after measuring provider limits.
  • Keep retries centralized.
    • Don’t stack LangGraph retries on top of SDK retries on top of custom retry decorators.

If you’re still seeing rate limit exceeded after fixing loops and concurrency, assume it’s either quota exhaustion or prompt bloat until proven otherwise. In production systems, those are usually the real causes.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides