How to Fix 'agent infinite loop during development' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
agent-infinite-loop-during-developmentllamaindexpython

When you see agent infinite loop during development in LlamaIndex, it usually means the agent keeps calling tools or re-entering its own reasoning loop without ever producing a final answer. In practice, this shows up during local testing when your tool output keeps nudging the agent back into the same decision path.

The failure is almost always caused by a bad tool contract, missing stop condition, or an agent that can call itself indirectly through another workflow.

The Most Common Cause

The #1 cause is a tool that returns something the agent interprets as another action request instead of a final result. In LlamaIndex, this often happens with FunctionTool, QueryEngineTool, or custom tools that return verbose text like “I found X, should I continue?” instead of a clean answer payload.

Here’s the broken pattern:

BrokenFixed
Tool output invites another loopTool output is deterministic and final
Agent sees ambiguous textAgent gets a concrete result
# BROKEN: tool output causes the agent to keep thinking
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool

def lookup_customer_status(customer_id: str) -> str:
    # Bad: this reads like a prompt to continue reasoning
    return f"Customer {customer_id} is active. Would you like me to check billing next?"

status_tool = FunctionTool.from_defaults(fn=lookup_customer_status)

agent = ReActAgent.from_tools([status_tool], verbose=True)
response = agent.chat("Check customer 1234")
print(response)
# FIXED: tool returns a final, machine-readable result
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool

def lookup_customer_status(customer_id: str) -> str:
    # Good: direct result, no follow-up question embedded in tool output
    return "active"

status_tool = FunctionTool.from_defaults(fn=lookup_customer_status)

agent = ReActAgent.from_tools([status_tool], verbose=True)
response = agent.chat("Check customer 1234")
print(response)

If you’re using ReActAgent, the model will keep selecting tools as long as the tool outputs look incomplete or suggest more work. Keep tool responses narrow and deterministic.

Other Possible Causes

1. No max-iteration limit

If you don’t cap iterations, a bad prompt or noisy tool can spin forever.

from llama_index.core.agent import ReActAgent

agent = ReActAgent.from_tools(
    tools,
    max_iterations=5,
    verbose=True,
)

Without max_iterations, debugging becomes harder because the loop has no hard stop.

2. A tool calls the same agent again

This is common when wrapping an agent inside a helper function and then exposing that helper as a tool.

# BROKEN: recursive agent invocation
def ask_agent(question: str) -> str:
    return agent.chat(question).response  # agent calls itself through the tool

tool = FunctionTool.from_defaults(fn=ask_agent)

Fix it by separating orchestration from execution:

# FIXED: tool calls only a lower-level service, not the same agent
def ask_billing_service(question: str) -> str:
    return billing_client.search(question)

tool = FunctionTool.from_defaults(fn=ask_billing_service)

3. Prompt instructions encourage endless reflection

A system prompt like “keep checking until you are certain” sounds harmless but can trigger repeated tool use.

system_prompt = """
You must keep using tools until you are completely certain.
Never answer unless all possibilities are checked.
"""

Use bounded instructions instead:

system_prompt = """
Use at most one relevant tool call per sub-question.
If the result is sufficient, answer directly.
"""

4. Query engine returns partial context that triggers another retrieval pass

This happens with QueryEngineTool when your retriever returns fragments that look incomplete.

from llama_index.core.tools import QueryEngineTool

tool = QueryEngineTool.from_defaults(
    query_engine=my_query_engine,
    name="policy_search",
    description="Search policy docs and return concise answers."
)

Make sure your query engine is configured to return final answers, not raw chunk dumps. If needed, set a stronger synthesizer or reduce top-k noise.

How to Debug It

  1. Turn on verbose tracing

    • Use verbose=True on your agent.
    • Watch for repeated sequences like Thought -> Action -> Observation with the same tool over and over.
  2. Inspect each tool response

    • Print raw outputs from FunctionTool and QueryEngineTool.
    • Look for phrases like:
      • “Should I continue?”
      • “I need more information”
      • “Next step”
    • Those are loop magnets.
  3. Set hard limits

    • Add max_iterations=3 or 5.
    • If the error disappears and you get a partial answer, your issue is iteration control, not model failure.
  4. Remove tools one by one

    • Start with one tool.
    • If the loop stops when a specific tool is removed, that tool’s output contract is the problem.
    • This is faster than guessing from prompts alone.

Prevention

  • Keep tool outputs short, factual, and final.
  • Add explicit iteration limits to every production agent.
  • Never expose an agent as a tool to itself unless you’ve built strict recursion guards.
  • Test prompts with adversarial inputs that force edge-case behavior before shipping.
  • Log every Action/Observation pair so loops are visible in dev before they hit staging.

If you want a simple rule: LlamaIndex agents should decide; tools should answer. The moment a tool starts sounding like an assistant, you’re one step away from an infinite loop.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides