How to Fix 'agent infinite loop when scaling' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
agent-infinite-loop-when-scalinglanggraphpython

What this error usually means

If you’re seeing agent infinite loop when scaling in LangGraph, your graph is almost always re-entering the same node or cycle without a valid stop condition. In practice, this shows up when a graph that worked for one request starts looping under larger inputs, longer conversations, or multi-step tool use.

The symptom is usually one of these:

  • The agent keeps calling the same tool
  • StateGraph never reaches END
  • You hit GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition
  • A supervisor/router keeps routing back to the same worker

The Most Common Cause — missing or broken termination logic

The #1 cause is a conditional edge that always routes back into the loop, or an agent node that never returns a state transition that can end the graph.

This happens a lot when people scale from a toy demo to a real workflow with tools, retries, or multiple agents. The graph looks fine on small inputs, then starts cycling forever once the model decides to keep “thinking.”

Broken vs fixed pattern

Broken patternFixed pattern
Router always returns "agent"Router can return "end"
Agent node never sets a finish conditionAgent sets next_step / done flag
Tool result gets appended, but no exit path existsGraph checks whether work is complete before looping
# BROKEN
from langgraph.graph import StateGraph, END

def route(state):
    # Always loops back to agent
    return "agent"

def agent_node(state):
    response = llm.invoke(state["messages"])
    state["messages"].append(response)
    return state

graph = StateGraph(dict)
graph.add_node("agent", agent_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", route, {
    "agent": "agent",
})
app = graph.compile()
# FIXED
from typing import TypedDict, Literal
from langgraph.graph import StateGraph, END

class AgentState(TypedDict):
    messages: list
    done: bool

def route(state: AgentState) -> Literal["agent", "end"]:
    if state["done"]:
        return "end"
    return "agent"

def agent_node(state: AgentState):
    response = llm.invoke(state["messages"])
    state["messages"].append(response)

    # Set a real stop condition
    if "final answer" in response.content.lower():
        state["done"] = True

    return state

graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", route, {
    "agent": "agent",
    "end": END,
})
app = graph.compile()

If you are using create_react_agent, the same issue often shows up as repeated tool calls with no final assistant message. The model keeps selecting tools because your prompt or tool schema never gives it a reason to stop.

Other Possible Causes

1) Your recursion limit is too low for the task

Sometimes the graph is valid, but the workflow legitimately needs more than the default number of steps. LangGraph will raise:

  • langgraph.errors.GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition
app.invoke(
    {"messages": [...]},
    config={"recursion_limit": 50}
)

Use this only after confirming the graph actually terminates. Raising the limit on a broken loop just delays failure.

2) A tool node writes back into the wrong part of state

If your tool output mutates messages incorrectly or rewrites routing fields, your conditional edge may never see completion.

# BROKEN: overwrites routing metadata
state["next"] = "agent"
state["messages"] = tool_result
# FIXED: preserve routing fields and append messages properly
state["messages"].append(tool_result_message)
state["done"] = tool_result.get("done", False)

This is common when mixing plain dict state with TypedDict or Pydantic models and not keeping field updates disciplined.

3) Supervisor/router logic always picks the same worker

In multi-agent graphs, a supervisor can accidentally keep sending control to one worker forever.

def supervisor(state):
    return {"next": "researcher"}  # always routes here

Fix it by making routing depend on actual progress:

def supervisor(state):
    if state["research_complete"]:
        return {"next": "writer"}
    if state["needs_tool"]:
        return {"next": "tool_worker"}
    return {"next": "__end__"}

If you’re using an enum-style router, make sure END is reachable from every branch.

4) Your prompt encourages endless reflection

This is easy to miss. If your system prompt says things like “keep analyzing until certain” without defining what “certain” means, the model may never emit a final answer.

system_prompt = """
Think carefully and continue reasoning until fully confident.
"""

Prefer explicit completion criteria:

system_prompt = """
Answer once you have enough information.
If all required fields are present, provide the final response and stop.
"""

For tool agents, explicitly instruct:

  • call tools only when needed
  • stop after producing the final user-facing answer
  • do not repeat tool calls unless new evidence appears

How to Debug It

  1. Print every node transition Log node entry and exit so you can see where control repeats.

    def debug_node(name):
        def wrapper(state):
            print(f"ENTER {name}")
            result = nodes[name](state)
            print(f"EXIT {name}")
            return result
        return wrapper
    
  2. Inspect the routing decision Your bug is often in the conditional edge function, not the node itself.

    def route(state):
        decision = compute_next_step(state)
        print("ROUTE:", decision)
        return decision
    
  3. Check whether your stop flag ever changes If you rely on done, finished, or final, verify it flips during execution.

    print("done before:", state.get("done"))
    
  4. Temporarily lower complexity Remove tools, reduce agents to one node, and test whether END becomes reachable. If it does, add pieces back one at a time until the loop returns.

Prevention

  • Use an explicit terminal field in state like done: bool or status: Literal["running", "complete"].
  • Make every router branch able to reach END, even error branches.
  • Add integration tests that assert max step count and successful termination for representative inputs.
  • Keep prompts specific about stopping behavior when using ReAct-style agents or supervisors.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides