How to Fix 'agent infinite loop when scaling' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21

agent-infinite-loop-when-scalinglangchainpython

If you’re seeing agent infinite loop when scaling in a LangChain Python app, it usually means your agent keeps re-entering the same tool/LLM cycle until it hits a recursion or iteration limit. In practice, this shows up when the agent cannot produce a valid final answer, keeps calling the same tool, or your orchestration layer is re-running the same request across workers.

This is common when you move from a single local test to production traffic, where retries, longer contexts, or bad tool outputs expose a control-flow bug that was hidden before.

The Most Common Cause

The #1 cause is a tool loop: the agent calls a tool, gets back output that doesn’t help it finish, then calls the same tool again. In LangChain terms, you’ll often see errors like:

•GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition
•AgentExecutor repeatedly logging the same tool call
•OutputParserException when the model returns malformed agent output

Here’s the broken pattern:

Broken	Fixed
Tool returns unstructured text that the agent can’t use	Tool returns concise, structured output
Agent has no clear stop condition	Agent has explicit `max_iterations` / recursion guard
Prompt encourages “keep trying” behavior	Prompt forces final answer after one tool pass

# BROKEN
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain.tools import tool

@tool
def lookup_policy(policy_id: str) -> str:
    # Bad: vague, noisy output
    return f"Policy {policy_id} exists. Status may be active. Check again if needed."

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

agent = initialize_agent(
    tools=[lookup_policy],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

result = agent.invoke({"input": "Is policy 123 active?"})
print(result)

# FIXED
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain.tools import tool

class PolicyLookupResult(BaseModel):
    policy_id: str
    active: bool
    status: str

@tool
def lookup_policy(policy_id: str) -> str:
    # Good: deterministic, structured enough for the LLM to stop
    return PolicyLookupResult(
        policy_id=policy_id,
        active=True,
        status="active"
    ).model_dump_json()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

agent = initialize_agent(
    tools=[lookup_policy],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    max_iterations=3,
)

result = agent.invoke({"input": "Is policy 123 active?"})
print(result)

The important change is not just max_iterations. The real fix is making your tools return data the model can terminate on. If your tool output says “check again,” “maybe,” or “try another lookup,” you’ve built a loop trigger.

Other Possible Causes

1) Recursive graph/state updates in LangGraph

If you’re using LangGraph under LangChain, a node that routes back to itself without a proper exit condition will hit recursion limits fast.

# BAD: unconditional self-loop
def route(state):
    return "agent"

# GOOD: explicit exit path
def route(state):
    if state["done"]:
        return "__end__"
    return "agent"

Typical symptom:

•GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition

2) Tool side effects that mutate shared state

If multiple requests share the same memory object, cache entry, or session state, one request can keep re-triggering another. This happens a lot when scaling with threads or async workers.

# BAD: shared mutable state across requests
session_memory = []

def save_turn(msg):
    session_memory.append(msg)

Fix it by scoping memory per conversation:

# GOOD: per-session state
def get_memory(session_id: str):
    return {"session_id": session_id, "messages": []}

3) Poorly constrained prompt instructions

Prompts like “keep searching until you are certain” sound harmless but often cause endless tool usage. The model keeps trying because it thinks certainty requires more calls.

prompt = """
Use tools as needed.
Keep trying until you are completely certain.
"""

Use tighter instructions instead:

prompt = """
Use at most one tool call per question unless new information is required.
If the first result is sufficient, provide the final answer immediately.
"""

4) Retry middleware amplifying failures

At scale, retries can turn one bad agent step into repeated identical executions. If your app retries on every OutputParserException, you may be replaying the same malformed prompt forever.

# BAD: retrying all agent failures blindly
for _ in range(5):
    try:
        return agent.invoke({"input": query})
    except Exception:
        continue

Prefer targeted retries only for transient transport errors:

# GOOD: retry only network/LLM transport issues
try:
    return agent.invoke({"input": query})
except TimeoutError:
    # retry once with backoff
    ...

How to Debug It

•
Turn on verbose tracing
- •Set verbose=True on AgentExecutor.
- •If you use LangSmith, inspect whether the same tool call repeats with identical inputs.
- •You’re looking for repeated Action: / Observation: cycles.
•
Check for recursion or iteration limits
- •
  Search logs for:
  - •GraphRecursionError
  - •recursion limit
  - •max_iterations
  - •OutputParserException
- •If you hit these consistently at exactly the same depth, it’s usually control flow, not model quality.
•
Inspect tool outputs
- •Print raw tool responses before they go back to the agent.
- •Look for vague text like “unknown”, “please try again”, or duplicated records.
- •Tools should return stable data structures or tightly scoped text.
•
Remove concurrency from the equation
- •Run one request locally with no queue workers and no shared memory.
- •If the loop disappears, your bug is in state isolation or retry fan-out.
- •If it persists, it’s likely prompt/tool design.

Prevention

•
Keep tools deterministic and structured.
- •Return JSON or compact strings with clear fields.
- •Don’t make tools ask follow-up questions back to the agent.
•
Set hard execution limits.
- •Use max_iterations for classic agents.
- •Use recursion guards in LangGraph workflows.
•
Isolate state per request.
- •No shared global memory for conversation history.
- •No cross-request caches unless they’re keyed by session and validated.

If you want a quick rule of thumb: when an agent loops under load, assume your termination condition is weaker than your retry path. Fix both before tuning prompts or swapping models.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit