LlamaIndex Tutorial (Python): adding observability for beginners
This tutorial shows you how to add observability to a basic LlamaIndex Python app so you can see what your LLM calls are doing, how long they take, and where failures happen. You need this when a demo turns into a production workflow and you can’t debug agent behavior from print statements anymore.
What You'll Need
- •Python 3.10+
- •A virtual environment
- •
llama-index - •
llama-index-core - •
llama-index-llms-openai - •
llama-index-callbacks-arize-phoenixor another callback/telemetry backend if you want external tracing - •An OpenAI API key set as
OPENAI_API_KEY - •Optional: Phoenix running locally if you want trace visualization
Install the packages:
pip install llama-index llama-index-core llama-index-llms-openai llama-index-callbacks-arize-phoenix
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a minimal LlamaIndex query engine. This gives you a baseline before adding any tracing, so you can compare behavior once observability is enabled.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
llm=OpenAI(model="gpt-4o-mini")
)
response = query_engine.query("What is this dataset about?")
print(response)
- •Add callback instrumentation through LlamaIndex’s global settings. This is the cleanest beginner path because every index/query call in your process will emit events to the callback handler you register.
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import CallbackManager
documents = SimpleDirectoryReader("./data").load_data()
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.callback_manager = CallbackManager([])
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Summarize the main themes.")
print(response)
- •Wire in a real observability backend. Below uses Arize Phoenix integration, which is useful for inspecting traces, latency, retrieval steps, and prompt/response payloads without modifying your application logic.
from phoenix.trace import PhoenixClient
from phoenix.trace.dsl import SpanQuery
client = PhoenixClient(endpoint="http://localhost:6006")
traces = client.query_spans(
SpanQuery(
conditions=[],
limit=5,
)
)
for span in traces:
print(span.name, span.status_code)
If you want the LlamaIndex side to send spans to Phoenix, register the Phoenix callback handler:
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.callbacks.arize_phoenix import ArizePhoenixCallbackHandler
handler = ArizePhoenixCallbackHandler()
Settings.callback_manager.add_handler(handler)
Settings.llm = OpenAI(model="gpt-4o-mini")
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What are the key entities mentioned?")
print(response)
- •Add explicit event logging around your own application code. This matters because observability is not just about model calls; you also want to know which user request triggered which retrieval path and whether post-processing failed.
import logging
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app")
Settings.llm = OpenAI(model="gpt-4o-mini")
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
question = "Give me a short summary."
logger.info("starting query", extra={"question": question})
response = query_engine.query(question)
logger.info("finished query", extra={"answer_length": len(str(response))})
print(response)
- •Capture failures so they show up in your logs and tracing backend. Production observability only helps if exceptions are visible with enough context to reproduce them.
import logging
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app")
try:
Settings.llm = OpenAI(model="gpt-4o-mini")
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
response = index.as_query_engine().query("Explain the document structure.")
print(response)
except Exception as exc:
logger.exception("query failed", extra={"error_type": type(exc).__name__})
raise
Testing It
Run the script against a small local ./data folder with at least one text file. If observability is wired correctly, you should see normal application output plus trace entries or spans in your backend dashboard.
If you’re using Phoenix locally, open the UI and verify that each query creates spans for document loading, indexing, retrieval, and completion generation. Also check that latency and error information appear when you intentionally break the API key or point ./data at an empty folder.
A good smoke test is to run the same query twice and compare traces. You should be able to tell whether changes came from retrieval differences or from model output variation.
Next Steps
- •Add per-request metadata like
user_id,tenant_id, andsession_idto every trace. - •Instrument tool calls and agent steps if you move beyond simple query engines.
- •Export traces to your existing monitoring stack so LlamaIndex events sit next to app logs and infrastructure metrics.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit