How to Fix 'rate limit exceeded during development' in LangGraph (Python)
What this error means
rate limit exceeded during development usually means your LangGraph app is calling an LLM provider too often, too fast, or in a loop. In practice, this shows up when you’re testing a graph with recursion, retries, multiple nodes hitting the same model, or a bad control flow that keeps re-invoking the same step.
The actual provider error is often something like:
- •
openai.RateLimitError: Error code: 429 - •
anthropic.RateLimitError: 429 Too Many Requests - •
google.api_core.exceptions.ResourceExhausted: 429 Resource has been exhausted
LangGraph does not create the rate limit by itself. It just makes it easier to accidentally trigger one because graphs can fan out, retry, and recurse.
The Most Common Cause
The #1 cause is an accidental loop in your graph logic. A node returns a state that routes back to itself or to another node that immediately calls the model again, so one user action becomes dozens of LLM requests.
Here’s the broken pattern I see most often:
| Broken | Fixed |
|---|---|
| The graph keeps routing back into the same model node | The graph stops at a terminal state or conditionally exits |
| No guard on iteration count | Explicit max-steps / done flag |
| Model call happens on every pass | Model call happens once per turn |
# BROKEN
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
class State(TypedDict):
messages: list
should_continue: bool
def agent_node(state: State):
response = llm.invoke(state["messages"])
return {
"messages": state["messages"] + [response],
"should_continue": True, # always true = infinite loop risk
}
def route(state: State):
return "agent" if state["should_continue"] else END
graph = StateGraph(State)
graph.add_node("agent", agent_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", route)
app = graph.compile()
# FIXED
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
class State(TypedDict):
messages: list
steps: int
MAX_STEPS = 3
def agent_node(state: State):
response = llm.invoke(state["messages"])
return {
"messages": state["messages"] + [response],
"steps": state["steps"] + 1,
}
def route(state: State):
if state["steps"] >= MAX_STEPS:
return END
return "agent"
graph = StateGraph(State)
graph.add_node("agent", agent_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", route)
app = graph.compile()
If you are using StateGraph, MessageGraph, or a custom router, check whether your edge conditions ever allow the graph to terminate. A lot of “rate limit” bugs are really “my graph never stops” bugs.
Other Possible Causes
1. Multiple nodes call the same model in one run
If you have parallel branches or sequential nodes that all invoke ChatOpenAI, one input can become 5–10 API calls immediately.
def summarize(state):
return {"summary": llm.invoke(state["messages"])}
def classify(state):
return {"label": llm.invoke(state["messages"])}
# Both nodes hit the provider in the same execution path.
Fix by caching shared outputs in state and reusing them instead of calling the model twice.
2. Retry settings are too aggressive
Some wrappers retry on 429s automatically. If your app retries instantly without backoff, you can burn through limits faster.
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=6,
)
If you already have graph-level retries plus provider-level retries, reduce one layer. Double retry stacks are common in development.
3. Streaming or UI refresh is triggering repeated invocations
A frontend rerender can accidentally re-run the backend endpoint that compiles and invokes the graph.
# Example smell:
# Every page refresh calls app.invoke(...)
result = app.invoke({"messages": messages})
Make sure invocation happens on user submit, not on render. In FastAPI/Streamlit/Next.js setups, this is a very common hidden source of duplicate requests.
4. Your prompt is causing tool-call churn
If the model keeps asking for tools because your tool output is incomplete or ambiguous, LangGraph may keep cycling through tool nodes and model nodes.
# Bad tool output example:
return {"result": "ok"} # too vague for the agent to conclude anything
Return structured outputs with enough signal for the agent to stop:
return {"status": "done", "data": {...}}
How to Debug It
- •
Count how many times each node runs Add logging inside every node. If
agent_nodeprints 20 times for one request, you’ve found a loop or fan-out problem.def agent_node(state): print(f"agent_node steps={state['steps']}") ... - •
Inspect your routing function Look at every conditional edge and confirm there is a valid exit path. If all branches point back into model nodes, you will hit rate limits quickly.
- •
Temporarily disable retries Set
max_retries=0or lower it. If the error changes from repeated 429s to a single failure, retries were amplifying the issue. - •
Run with a hard step cap Add
stepsto state and stop after 2–3 iterations. If rate limits disappear, your graph logic was looping more than expected.
Prevention
- •Add an explicit termination condition to every cyclic LangGraph workflow.
- •Track per-run call counts in state so you can fail fast before hitting provider limits.
- •Keep one layer of retries only: either wrapper-level retries or application-level retries, not both.
If you want a quick rule: when LangGraph throws rate-limit errors during development, assume your graph is making more calls than you think until proven otherwise.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit