How to Fix 'rate limit exceeded' in LangGraph (Python)
What the error means
rate limit exceeded in LangGraph usually means one of the underlying model calls hit a provider quota, not that LangGraph itself is broken. You’ll see it when your graph makes too many LLM requests too quickly, or when retries, loops, or concurrent runs multiply the number of calls.
The exact exception often comes from the provider SDK underneath LangGraph, for example openai.RateLimitError, anthropic.RateLimitError, or a generic HTTPStatusError: 429. LangGraph just surfaces it while executing a node.
The Most Common Cause
The #1 cause is an accidental loop or repeated node execution that keeps calling the model on every step. In LangGraph, this usually happens when your conditional edge keeps routing back to the same LLM node without a hard stop.
Here’s the broken pattern:
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
class State(TypedDict):
messages: list
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def call_model(state: State):
response = llm.invoke(state["messages"])
return {"messages": state["messages"] + [response]}
def should_continue(state: State):
# Bug: always returns "continue"
return "continue"
graph = StateGraph(State)
graph.add_node("agent", call_model)
graph.add_conditional_edges("agent", should_continue, {
"continue": "agent",
"end": END,
})
graph.set_entry_point("agent")
app = graph.compile()
And here’s the fixed version:
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
class State(TypedDict):
messages: list
iterations: int
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def call_model(state: State):
response = llm.invoke(state["messages"])
return {
"messages": state["messages"] + [response],
"iterations": state.get("iterations", 0) + 1,
}
def should_continue(state: State):
if state["iterations"] >= 3:
return "end"
return "continue"
graph = StateGraph(State)
graph.add_node("agent", call_model)
graph.add_conditional_edges("agent", should_continue, {
"continue": "agent",
"end": END,
})
graph.set_entry_point("agent")
app = graph.compile()
The key difference is simple: you need an explicit termination condition. Without it, one user request can become 10, 20, or 100 model calls and trip provider limits fast.
Other Possible Causes
1. Too much concurrency
If you run many graph executions at once, each node may call the model independently. That can blow through per-minute request limits even if each individual graph is correct.
# Problematic: firing many runs at once
results = app.batch(inputs) # may trigger burst traffic
Use throttling or lower concurrency:
results = app.batch(inputs, config={"max_concurrency": 2})
2. Retries multiplying requests
LangChain and provider clients often retry failed calls automatically. If your graph already retries at the node level, you can end up with duplicate retry layers.
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=6, # can amplify traffic under load
)
Try lowering retries and handling failures at one layer only:
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=2,
)
3. Replaying full message history on every node
If each node sends the entire conversation back to the model, token usage and request volume both rise quickly. This gets expensive fast in multi-node graphs.
def call_model(state):
return llm.invoke(state["messages"]) # entire history every time
Trim context before invoking:
def call_model(state):
recent_messages = state["messages"][-6:]
return llm.invoke(recent_messages)
4. Provider-side quota or tier limits
Sometimes the code is fine and your API key just hit its plan limit. The error may look like this:
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit exceeded'}}
Or for Anthropic:
anthropic.RateLimitError: Error code: 429 - {'type': 'rate_limit_error'}
In that case, check your dashboard for:
- •requests per minute
- •tokens per minute
- •daily spend caps
- •org-level quotas
How to Debug It
- •
Count how many times each node runs Add logging inside every node:
def call_model(state): print(f"agent called with iterations={state.get('iterations', 0)}") ...If one input triggers multiple unexpected calls, you have a graph flow problem.
- •
Inspect the stack trace If you see
openai.RateLimitError,anthropic.RateLimitError, orHTTPStatusError: 429, the provider rejected the request. If you see repeated LangGraph node execution before that error, your graph logic is probably looping. - •
Disable concurrency temporarily Run one request at a time:
result = app.invoke(input_state)If the error disappears, your issue is burst traffic or batch fan-out.
- •
Reduce retries and context Set
max_retries=0or1and trim message history. If failures stop after this change, you were amplifying requests through retries or oversized prompts.
Prevention
- •Put a hard stop in every looped LangGraph flow.
- •Use counters like
iterations,tool_calls, ordepth.
- •Use counters like
- •Limit concurrency on batch jobs.
- •Start with
max_concurrency=1or2and scale up after measuring provider limits.
- •Start with
- •Keep retries centralized.
- •Don’t stack LangGraph retries on top of SDK retries on top of custom retry decorators.
If you’re still seeing rate limit exceeded after fixing loops and concurrency, assume it’s either quota exhaustion or prompt bloat until proven otherwise. In production systems, those are usually the real causes.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit