LlamaIndex Tutorial (Python): testing agents locally for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindextesting-agents-locally-for-beginnerspython

This tutorial shows you how to run and test a LlamaIndex agent locally in Python without wiring it into a web app first. You need this when you want to debug tool calls, inspect intermediate behavior, and validate prompts before exposing the agent to users.

What You'll Need

•Python 3.10+
•A virtual environment
•llama-index
•llama-index-llms-openai
•openai API key set as an environment variable
•Basic familiarity with LlamaIndex QueryEngineTool and ReActAgent

Install the packages:

pip install llama-index llama-index-llms-openai openai

Set your OpenAI key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Create a small local knowledge base and index it.

For local testing, keep the data tiny and deterministic. A few short documents are enough to verify retrieval, tool selection, and response formatting.

from llama_index.core import Document, VectorStoreIndex

docs = [
    Document(text="LlamaIndex is a framework for connecting LLMs to data."),
    Document(text="Agents can call tools like query engines to answer questions."),
    Document(text="Local testing helps catch prompt and tool routing issues early."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

•Wrap the query engine as a tool.

Agents do not talk to indexes directly. They call tools, so expose the query engine with a clear name and description that tells the agent when to use it.

from llama_index.core.tools import QueryEngineTool, ToolMetadata

kb_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="knowledge_base",
        description="Use this for questions about LlamaIndex, agents, or local testing."
    ),
)

•Build a local agent that can use the tool.

Use a standard OpenAI-backed LLM and the ReAct agent pattern. This is enough for local validation because you can observe whether the agent chooses the right tool and produces a grounded answer.

import os
from llama_index.llms.openai import OpenAI
from llama_index.core.agent import ReActAgent

llm = OpenAI(model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])

agent = ReActAgent.from_tools(
    tools=[kb_tool],
    llm=llm,
    verbose=True,
)

•Run a few local test queries.

Start with questions that should clearly hit your indexed content. Then add one question that checks whether the agent avoids hallucinating when the answer is not directly in the documents.

questions = [
    "What is LlamaIndex?",
    "Why would someone test agents locally?",
    "What does an agent use to answer questions from data?",
]

for question in questions:
    response = agent.chat(question)
    print(f"\nQ: {question}")
    print(f"A: {response}\n")

•Add assertions so testing becomes repeatable.

Manual inspection is fine once. After that, turn it into a small regression test so you can rerun it after prompt changes or package upgrades.

def ask(q: str) -> str:
    return str(agent.chat(q))

answer = ask("What does an agent use to answer questions from data?")
assert "tool" in answer.lower() or "query engine" in answer.lower()

answer2 = ask("What is local testing useful for?")
assert "debug" in answer2.lower() or "testing" in answer2.lower()

print("Local agent checks passed.")

Testing It

Run the script from your terminal and watch the verbose output from ReActAgent. You should see the agent decide whether to call knowledge_base, then return an answer based on your three documents.

If it answers without using the tool for questions that should be grounded in your docs, your tool description is too weak or your prompt is too loose. If it calls the tool but still gives vague answers, tighten the document text or reduce model randomness by keeping test prompts simple.

For repeatability, keep this script under version control and run it after every change to prompts, tools, or model settings.

Next Steps

•Add a second tool, such as a calculator or date parser, and verify tool selection across multiple tools.
•Replace VectorStoreIndex with a persistent vector store once local behavior is stable.
•Move these assertions into pytest so you can run them in CI before deploying agents.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit