How to Fix 'agent infinite loop when scaling' in LlamaIndex (Python)
When you see agent infinite loop when scaling in a LlamaIndex Python app, it usually means the agent keeps selecting tools or re-entering the same reasoning path without ever reaching a terminal answer. In practice, this shows up after you add more tools, more retrieval steps, or recursive workflows, and the agent starts bouncing between AgentRunner, ReActAgent, or tool calls until it hits a max-iteration guard.
The fix is almost always in your control flow: bad tool design, missing stop conditions, or an agent that can call itself indirectly.
The Most Common Cause
The #1 cause is a tool that returns something the agent interprets as “keep going” instead of “done”.
Typical pattern:
- •The tool wraps an agent
- •The agent calls the tool
- •The tool calls the same agent again
- •You get repeated logs like:
- •
AgentRunner.step() - •
ReActAgent.take_step() - •
Reached max_iterations - •sometimes
RuntimeError: Agent hit recursion limit
- •
Here’s the broken pattern next to the fixed one.
| Broken | Fixed |
|---|---|
| ```python | |
| from llama_index.core.agent import ReActAgent | |
| from llama_index.core.tools import FunctionTool |
BAD: tool calls back into the same agent
def search_and_answer(query: str) -> str: return agent.chat(query).response
tool = FunctionTool.from_defaults(fn=search_and_answer)
agent = ReActAgent.from_tools([tool], verbose=True)
response = agent.chat("Find policy details")
|python
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
GOOD: tool does one job and returns final data
def search_docs(query: str) -> str: # call your retriever / DB / API here return "Policy details: coverage starts after 30 days."
tool = FunctionTool.from_defaults(fn=search_docs)
agent = ReActAgent.from_tools([tool], verbose=True) response = agent.chat("Find policy details")
The rule is simple: **a tool should not call the same agent that owns it**. If you need multi-step orchestration, put that logic in a separate workflow layer, not inside a tool callback.
## Other Possible Causes
### 1. Tool output is too vague
If your tool returns text like `"done"`, `"ok"`, or a partial answer, the LLM often treats it as incomplete and keeps querying.
```python
def lookup_customer(customer_id: str) -> str:
return "found"
Fix it by returning structured, specific output:
def lookup_customer(customer_id: str) -> str:
return '{"customer_id":"123","status":"active","plan":"premium"}'
2. No max iteration or step limit
If you don’t cap iterations, a bad prompt/tool combo can loop forever until downstream code fails.
from llama_index.core.agent import ReActAgent
agent = ReActAgent.from_tools(
tools,
max_iterations=20,
verbose=True,
)
If you’re using a runner or workflow class, look for equivalents like:
- •
max_iterations - •
max_steps - •
timeout
3. Prompt tells the model to keep exploring
This happens when your system prompt encourages exhaustive reasoning without a stopping rule.
Bad prompt:
system_prompt = """
Keep searching until you are completely certain.
Never stop early.
"""
Better:
system_prompt = """
Use at most 3 tool calls.
If enough evidence exists, answer directly.
If not enough evidence exists, say what is missing.
"""
4. Recursive retrieval chain
A retriever can trigger another retriever through a query engine wrapper, especially if you nest QueryEngineTool objects badly.
from llama_index.core.tools import QueryEngineTool
# BAD: engine A uses engine B which routes back to A
tool_a = QueryEngineTool.from_defaults(query_engine=engine_a)
tool_b = QueryEngineTool.from_defaults(query_engine=engine_b)
Keep retrieval layers flat:
- •retriever → response synthesizer → final answer
- •avoid circular references between query engines
How to Debug It
- •
Turn on verbose logging
- •Set
verbose=Trueon the agent. - •Watch for repeated sequences like:
- •
Thought - •
Action - •same
Actionagain with no new info
- •
- •Set
- •
Inspect each tool output
- •Print raw outputs before they go back to the agent.
- •If you see empty strings, generic acknowledgements, or repeated payloads, that’s your loop source.
- •
Remove tools one by one
- •Start with only one tool.
- •If the loop disappears, add tools back until it returns.
- •The last added tool is usually calling something recursively or returning ambiguous output.
- •
Check iteration guards
- •Search for
max_iterations,max_steps, recursion limits, and timeout settings. - •If they’re missing or too high, add them while fixing the root cause.
- •Search for
A useful debugging pattern is to log every call boundary:
def wrapped_tool(query: str) -> str:
print(f"[tool] query={query}")
result = real_tool(query)
print(f"[tool] result={result[:200]}")
return result
If you see the exact same query/result pair repeating, you’ve found the loop.
Prevention
- •
Keep tools pure and single-purpose.
- •A tool should fetch data, transform data, or call an external API.
- •It should not orchestrate an entire reasoning cycle.
- •
Add hard stop conditions everywhere.
- •Use
max_iterationson agents. - •Add timeouts around network calls.
- •Return explicit “insufficient data” responses when needed.
- •Use
- •
Test for recursion before production.
- •Run a smoke test with verbose logging enabled.
- •Use synthetic prompts that force edge cases like ambiguous queries and empty retrieval results.
If you’re scaling LlamaIndex agents in production, treat infinite loops as an architecture bug, not just a model quirk. Once you remove recursive tool calls and tighten stop conditions, this error usually disappears fast.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit