LlamaIndex Tutorial (Python): debugging agent loops for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexdebugging-agent-loops-for-advanced-developerspython

This tutorial shows you how to instrument a LlamaIndex agent so you can see why it keeps looping, where tool calls are repeating, and how to stop it before it burns tokens. You need this when an agent looks “stuck” in production: same query, same tool, same answer path, over and over.

What You'll Need

  • Python 3.10+
  • llama-index
  • llama-index-llms-openai
  • llama-index-tools-tavily-search if you want a real external tool
  • OpenAI API key in OPENAI_API_KEY
  • Optional: TAVILY_API_KEY for search tool testing
  • A terminal with basic Python packaging tools

Install the packages:

pip install llama-index llama-index-llms-openai llama-index-tools-tavily-search

Set your environment variables:

export OPENAI_API_KEY="your-openai-key"
export TAVILY_API_KEY="your-tavily-key"

Step-by-Step

  1. Start by building a small agent with one tool and explicit debug logging. The point is not to make it smart; the point is to make every decision observable.
import logging

from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI

logging.basicConfig(level=logging.INFO)

def get_policy_status(policy_id: str) -> str:
    return f"Policy {policy_id} is active and paid through 2026-01-01."

tool = FunctionTool.from_defaults(
    fn=get_policy_status,
    name="get_policy_status",
    description="Get the current status of an insurance policy by policy ID.",
)

llm = OpenAI(model="gpt-4o-mini", temperature=0)

agent = AgentWorkflow.from_tools_or_functions(
    [tool],
    llm=llm,
    system_prompt="Use tools only when needed. Do not repeat the same tool call twice.",
)
  1. Add a trace-friendly wrapper around your tool inputs and outputs. In real incidents, loop debugging usually starts with confirming whether the model is reusing the same arguments or failing to incorporate the result.
from typing import Callable

def traced_tool(fn: Callable[[str], str]) -> Callable[[str], str]:
    def wrapper(policy_id: str) -> str:
        print(f"[TOOL CALL] get_policy_status(policy_id={policy_id!r})")
        result = fn(policy_id)
        print(f"[TOOL RESULT] {result}")
        return result
    return wrapper

traced_get_policy_status = traced_tool(get_policy_status)

debug_tool = FunctionTool.from_defaults(
    fn=traced_get_policy_status,
    name="get_policy_status",
    description="Get the current status of an insurance policy by policy ID.",
)

debug_agent = AgentWorkflow.from_tools_or_functions(
    [debug_tool],
    llm=llm,
    system_prompt="Use tools only when needed. Never call the same tool twice for the same input.",
)
  1. Run a controlled prompt that would normally trigger repeated reasoning if your agent is misconfigured. Use a single query and inspect whether the output stabilizes after one tool call.
import asyncio

async def main() -> None:
    response = await debug_agent.run(
        input="What is the status of policy 12345? Give me a concise answer."
    )
    print("\n[FINAL RESPONSE]")
    print(response)

if __name__ == "__main__":
    asyncio.run(main())
  1. If you suspect an actual loop, add a hard stop at the orchestration layer. This is the production pattern: cap iterations, log each step, and fail closed instead of letting token spend run away.
import asyncio

async def guarded_run(prompt: str) -> None:
    try:
        response = await asyncio.wait_for(
            debug_agent.run(input=prompt),
            timeout=20,
        )
        print(response)
    except asyncio.TimeoutError:
        print("[TIMEOUT] Agent exceeded allowed runtime")

if __name__ == "__main__":
    asyncio.run(guarded_run("Check policy 12345 and explain why it may be inactive."))
  1. Make loop detection explicit by checking for repeated tool arguments in your own logs. In practice, this catches cases where the model keeps asking the same question because it never trusts its prior observation.
seen_calls = set()

def monitored_get_policy_status(policy_id: str) -> str:
    key = ("get_policy_status", policy_id)
    if key in seen_calls:
        raise RuntimeError(f"Repeated tool call detected for {key}")
    seen_calls.add(key)
    return get_policy_status(policy_id)

monitored_tool = FunctionTool.from_defaults(
    fn=monitored_get_policy_status,
    name="get_policy_status",
    description="Get policy status and fail on repeated identical calls.",
)

Testing It

Run the script once with a normal prompt like “What is the status of policy 12345?” You should see exactly one tool call and one final response that uses the returned status.

Then try a prompt that encourages overthinking, such as “Keep checking until you’re sure.” A healthy setup should still stop after one relevant lookup or fail fast if your repetition guard catches duplicate calls.

If you see repeated [TOOL CALL] lines with identical arguments, your agent is looping because it’s not grounding on tool output or your prompt is too permissive. Tighten the system prompt, reduce max iterations if your workflow exposes that control, and keep the repetition guard in place during debugging.

Next Steps

  • Add structured tracing with OpenTelemetry so each agent step becomes searchable in your observability stack.
  • Test loop behavior against multiple models; some will over-call tools more aggressively than others.
  • Move from ad hoc print logging to JSON logs with request IDs, user IDs, and conversation IDs for production incident review.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides