How to Fix 'agent infinite loop in production' in LangGraph (Python)
What this error actually means
If you’re seeing agent infinite loop in production, your LangGraph agent is cycling through nodes without ever reaching a terminal state. In practice, this usually means your graph keeps routing back to the same node, or your assistant keeps calling tools without producing a final answer.
The failure often shows up as a GraphRecursionError or a runtime that keeps hitting the recursion limit until LangGraph stops it.
The Most Common Cause
The #1 cause is a bad conditional edge that always routes back to the agent node. In LangGraph, this usually happens when your router never returns an end condition like END, or when your tool execution path never updates state in a way that changes the next route.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
Always routes back to agent | Routes to tools only when tool calls exist |
| No explicit stop condition | Returns END when the assistant is done |
| State doesn’t change meaningfully | State is updated with tool results and final messages |
# BROKEN
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from langchain_core.messages import AnyMessage
class AgentState(TypedDict):
messages: Annotated[list[AnyMessage], list]
def route(state: AgentState):
# Always loops back to agent
return "agent"
builder = StateGraph(AgentState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", route, {
"agent": "agent",
"tools": "tools",
"end": END,
})
graph = builder.compile()
# FIXED
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from langchain_core.messages import AnyMessage, AIMessage
class AgentState(TypedDict):
messages: Annotated[list[AnyMessage], list]
def route(state: AgentState):
last_msg = state["messages"][-1]
# If model requested tools, go there
if getattr(last_msg, "tool_calls", None):
return "tools"
# Otherwise stop
return "end"
builder = StateGraph(AgentState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", route, {
"tools": "tools",
"end": END,
})
graph = builder.compile()
The key difference is simple: the graph must have a real exit path. If your router can only ever return another internal node, LangGraph will keep executing until it hits something like:
langgraph.errors.GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition.
Other Possible Causes
1) Your agent keeps producing tool calls forever
This happens when the model is prompted badly or the tool result doesn’t satisfy the model, so it asks for the same tool again.
# BAD: model can keep asking for the same tool
assistant = llm.bind_tools([search_tool])
# GOOD: add a final-answer instruction and cap retries in state
system_prompt = """
Use tools only when needed.
After you have enough information, answer directly.
Do not call tools again once you have sufficient evidence.
"""
2) Tool node does not append results back into state
If the tool output never gets added to messages, the model sees no progress and repeats itself.
# BROKEN
def tool_node(state):
result = run_tool(state["messages"][-1])
return {} # nothing written back
# FIXED
from langchain_core.messages import ToolMessage
def tool_node(state):
last = state["messages"][-1]
result = run_tool(last)
return {
"messages": [ToolMessage(content=str(result), tool_call_id=last.tool_calls[0]["id"])]
}
3) Your conditional edge checks the wrong field
A common bug is checking state["messages"][-1].content instead of tool_calls. That makes routing brittle and often wrong.
# BAD
def route(state):
if "tool" in state["messages"][-1].content.lower():
return "tools"
return "agent"
# GOOD
def route(state):
last = state["messages"][-1]
if getattr(last, "tool_calls", None):
return "tools"
return "end"
4) You set recursion limits too high and hide the bug
This does not cause the loop, but it makes production failures harder to spot. The graph runs longer before failing.
graph.invoke(
{"messages": [("user", "Find policy details")]},
config={"recursion_limit": 100}
)
Lower it during debugging so you catch bad cycles quickly:
graph.invoke(
{"messages": [("user", "Find policy details")]},
config={"recursion_limit": 10}
)
How to Debug It
- •
Inspect the last message at every node
- •Print
state["messages"][-1] - •Confirm whether it contains
tool_calls - •Check whether each node actually changes state
- •Print
- •
Trace routing decisions
- •Log what your router returns on every hop
- •If you see
agent -> tools -> agent -> tools, your stop condition is missing
- •
Run with a low recursion limit
- •Use
config={"recursion_limit": 5} - •A fast failure is better than waiting for a long loop in production logs
- •Use
- •
Test each node independently
- •Call your agent node once and inspect output
- •Call your tool node once and verify it appends a
ToolMessage - •Verify that after one tool round-trip, the next assistant message can terminate
Prevention
- •Always design graphs with an explicit terminal path using
END. - •Make routing depend on structured fields like
tool_calls, not string matching on message text. - •Add unit tests that simulate:
- •one normal completion path
- •one tool call path
- •one repeated-tool-call path that must terminate
If you want one rule to remember: every loop in LangGraph needs a measured exit. If your agent can only bounce between nodes and never produce a final answer or hit END, you will get GraphRecursionError sooner or later.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit