How to Fix 'agent infinite loop during development' in LlamaIndex (Python)
When you see agent infinite loop during development in LlamaIndex, it usually means the agent keeps calling tools or re-entering its own reasoning loop without ever producing a final answer. In practice, this shows up during local testing when your tool output keeps nudging the agent back into the same decision path.
The failure is almost always caused by a bad tool contract, missing stop condition, or an agent that can call itself indirectly through another workflow.
The Most Common Cause
The #1 cause is a tool that returns something the agent interprets as another action request instead of a final result. In LlamaIndex, this often happens with FunctionTool, QueryEngineTool, or custom tools that return verbose text like “I found X, should I continue?” instead of a clean answer payload.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Tool output invites another loop | Tool output is deterministic and final |
| Agent sees ambiguous text | Agent gets a concrete result |
# BROKEN: tool output causes the agent to keep thinking
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
def lookup_customer_status(customer_id: str) -> str:
# Bad: this reads like a prompt to continue reasoning
return f"Customer {customer_id} is active. Would you like me to check billing next?"
status_tool = FunctionTool.from_defaults(fn=lookup_customer_status)
agent = ReActAgent.from_tools([status_tool], verbose=True)
response = agent.chat("Check customer 1234")
print(response)
# FIXED: tool returns a final, machine-readable result
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
def lookup_customer_status(customer_id: str) -> str:
# Good: direct result, no follow-up question embedded in tool output
return "active"
status_tool = FunctionTool.from_defaults(fn=lookup_customer_status)
agent = ReActAgent.from_tools([status_tool], verbose=True)
response = agent.chat("Check customer 1234")
print(response)
If you’re using ReActAgent, the model will keep selecting tools as long as the tool outputs look incomplete or suggest more work. Keep tool responses narrow and deterministic.
Other Possible Causes
1. No max-iteration limit
If you don’t cap iterations, a bad prompt or noisy tool can spin forever.
from llama_index.core.agent import ReActAgent
agent = ReActAgent.from_tools(
tools,
max_iterations=5,
verbose=True,
)
Without max_iterations, debugging becomes harder because the loop has no hard stop.
2. A tool calls the same agent again
This is common when wrapping an agent inside a helper function and then exposing that helper as a tool.
# BROKEN: recursive agent invocation
def ask_agent(question: str) -> str:
return agent.chat(question).response # agent calls itself through the tool
tool = FunctionTool.from_defaults(fn=ask_agent)
Fix it by separating orchestration from execution:
# FIXED: tool calls only a lower-level service, not the same agent
def ask_billing_service(question: str) -> str:
return billing_client.search(question)
tool = FunctionTool.from_defaults(fn=ask_billing_service)
3. Prompt instructions encourage endless reflection
A system prompt like “keep checking until you are certain” sounds harmless but can trigger repeated tool use.
system_prompt = """
You must keep using tools until you are completely certain.
Never answer unless all possibilities are checked.
"""
Use bounded instructions instead:
system_prompt = """
Use at most one relevant tool call per sub-question.
If the result is sufficient, answer directly.
"""
4. Query engine returns partial context that triggers another retrieval pass
This happens with QueryEngineTool when your retriever returns fragments that look incomplete.
from llama_index.core.tools import QueryEngineTool
tool = QueryEngineTool.from_defaults(
query_engine=my_query_engine,
name="policy_search",
description="Search policy docs and return concise answers."
)
Make sure your query engine is configured to return final answers, not raw chunk dumps. If needed, set a stronger synthesizer or reduce top-k noise.
How to Debug It
- •
Turn on verbose tracing
- •Use
verbose=Trueon your agent. - •Watch for repeated sequences like
Thought -> Action -> Observationwith the same tool over and over.
- •Use
- •
Inspect each tool response
- •Print raw outputs from
FunctionToolandQueryEngineTool. - •Look for phrases like:
- •“Should I continue?”
- •“I need more information”
- •“Next step”
- •Those are loop magnets.
- •Print raw outputs from
- •
Set hard limits
- •Add
max_iterations=3or5. - •If the error disappears and you get a partial answer, your issue is iteration control, not model failure.
- •Add
- •
Remove tools one by one
- •Start with one tool.
- •If the loop stops when a specific tool is removed, that tool’s output contract is the problem.
- •This is faster than guessing from prompts alone.
Prevention
- •Keep tool outputs short, factual, and final.
- •Add explicit iteration limits to every production agent.
- •Never expose an agent as a tool to itself unless you’ve built strict recursion guards.
- •Test prompts with adversarial inputs that force edge-case behavior before shipping.
- •Log every
Action/Observationpair so loops are visible in dev before they hit staging.
If you want a simple rule: LlamaIndex agents should decide; tools should answer. The moment a tool starts sounding like an assistant, you’re one step away from an infinite loop.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit