LlamaIndex Tutorial (Python): testing agents locally for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindextesting-agents-locally-for-intermediate-developerspython

This tutorial shows you how to run and test a LlamaIndex agent locally in Python, without wiring it into a full app first. You need this when you want to debug tool calls, inspect intermediate reasoning, and verify your agent behaves correctly before you expose it to users.

What You'll Need

  • Python 3.10 or newer
  • A virtual environment
  • llama-index
  • An OpenAI API key
  • Basic familiarity with LlamaIndex agents and tools
  • A terminal for running the script locally

Install the package:

pip install llama-index

Set your API key in the shell:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a clean local script and define a simple tool the agent can call. Keep the tool deterministic so you can test whether the agent is choosing it correctly.
from llama_index.core.tools import FunctionTool

def get_policy_status(policy_id: str) -> str:
    policies = {
        "P1001": "Active",
        "P1002": "Pending renewal",
        "P1003": "Cancelled",
    }
    return policies.get(policy_id, "Policy not found")

policy_tool = FunctionTool.from_defaults(fn=get_policy_status)
  1. Build an agent that can use that tool. For local testing, keep the setup minimal and use a small model so responses are quick and easy to inspect.
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")
agent = ReActAgent.from_tools(
    [policy_tool],
    llm=llm,
    verbose=True,
)
  1. Run a few direct queries from Python. The point here is not just to get an answer, but to confirm the agent actually calls the tool when it should.
response_1 = agent.chat("What is the status of policy P1001?")
response_2 = agent.chat("Check policy P1003 for me.")
response_3 = agent.chat("Tell me the status of policy P9999.")

print("Response 1:", response_1)
print("Response 2:", response_2)
print("Response 3:", response_3)
  1. Add a repeatable local test harness so you can validate behavior after changes. This is where most developers skip discipline and end up debugging by hand later.
tests = [
    ("What is the status of policy P1001?", "Active"),
    ("Check policy P1002.", "Pending renewal"),
    ("What about policy P9999?", "Policy not found"),
]

for prompt, expected in tests:
    result = str(agent.chat(prompt))
    passed = expected in result
    print(f"PROMPT: {prompt}")
    print(f"EXPECTED: {expected}")
    print(f"RESULT: {result}")
    print(f"PASS: {passed}\n")
  1. If you want cleaner local debugging, wrap the agent call in a function and log inputs and outputs. That gives you a stable surface for unit tests or future integration into FastAPI, Celery, or whatever runtime you use next.
def run_agent_query(query: str) -> str:
    print(f"[QUERY] {query}")
    result = agent.chat(query)
    output = str(result)
    print(f"[OUTPUT] {output}")
    return output

if __name__ == "__main__":
    run_agent_query("What is the status of policy P1001?")

Testing It

Run the script from your terminal with python your_script.py. You should see verbose agent output showing tool selection, followed by final answers that match your test cases.

If verbose=True is working, you’ll see whether the agent decided to call get_policy_status instead of guessing. That matters because local testing is mostly about catching bad tool usage before it ships.

If you get authentication errors, check that OPENAI_API_KEY is set in the same shell session where you run Python. If answers look inconsistent, lower complexity first: keep tools deterministic and avoid adding multiple tools until this one behaves correctly.

Next Steps

  • Add pytest tests around run_agent_query() so regressions fail fast.
  • Replace the mock policy dictionary with a real internal service client.
  • Explore streaming responses and callback handlers for better observability during local debugging.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides