LlamaIndex Tutorial (Python): adding observability for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexadding-observability-for-beginnerspython

This tutorial shows you how to add observability to a basic LlamaIndex Python app so you can see what your LLM calls are doing, how long they take, and where failures happen. You need this when a demo turns into a production workflow and you can’t debug agent behavior from print statements anymore.

What You'll Need

•Python 3.10+
•A virtual environment
•llama-index
•llama-index-core
•llama-index-llms-openai
•llama-index-callbacks-arize-phoenix or another callback/telemetry backend if you want external tracing
•An OpenAI API key set as OPENAI_API_KEY
•Optional: Phoenix running locally if you want trace visualization

Install the packages:

pip install llama-index llama-index-core llama-index-llms-openai llama-index-callbacks-arize-phoenix

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start with a minimal LlamaIndex query engine. This gives you a baseline before adding any tracing, so you can compare behavior once observability is enabled.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(
    llm=OpenAI(model="gpt-4o-mini")
)

response = query_engine.query("What is this dataset about?")
print(response)

•Add callback instrumentation through LlamaIndex’s global settings. This is the cleanest beginner path because every index/query call in your process will emit events to the callback handler you register.

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import CallbackManager

documents = SimpleDirectoryReader("./data").load_data()

Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.callback_manager = CallbackManager([])

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("Summarize the main themes.")
print(response)

•Wire in a real observability backend. Below uses Arize Phoenix integration, which is useful for inspecting traces, latency, retrieval steps, and prompt/response payloads without modifying your application logic.

from phoenix.trace import PhoenixClient
from phoenix.trace.dsl import SpanQuery

client = PhoenixClient(endpoint="http://localhost:6006")

traces = client.query_spans(
    SpanQuery(
        conditions=[],
        limit=5,
    )
)

for span in traces:
    print(span.name, span.status_code)

If you want the LlamaIndex side to send spans to Phoenix, register the Phoenix callback handler:

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.callbacks.arize_phoenix import ArizePhoenixCallbackHandler

handler = ArizePhoenixCallbackHandler()
Settings.callback_manager.add_handler(handler)
Settings.llm = OpenAI(model="gpt-4o-mini")

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the key entities mentioned?")
print(response)

•Add explicit event logging around your own application code. This matters because observability is not just about model calls; you also want to know which user request triggered which retrieval path and whether post-processing failed.

import logging
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app")

Settings.llm = OpenAI(model="gpt-4o-mini")

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

question = "Give me a short summary."
logger.info("starting query", extra={"question": question})

response = query_engine.query(question)

logger.info("finished query", extra={"answer_length": len(str(response))})
print(response)

•Capture failures so they show up in your logs and tracing backend. Production observability only helps if exceptions are visible with enough context to reproduce them.

import logging
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app")

try:
    Settings.llm = OpenAI(model="gpt-4o-mini")
    documents = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    response = index.as_query_engine().query("Explain the document structure.")
    print(response)
except Exception as exc:
    logger.exception("query failed", extra={"error_type": type(exc).__name__})
    raise

Testing It

Run the script against a small local ./data folder with at least one text file. If observability is wired correctly, you should see normal application output plus trace entries or spans in your backend dashboard.

If you’re using Phoenix locally, open the UI and verify that each query creates spans for document loading, indexing, retrieval, and completion generation. Also check that latency and error information appear when you intentionally break the API key or point ./data at an empty folder.

A good smoke test is to run the same query twice and compare traces. You should be able to tell whether changes came from retrieval differences or from model output variation.

Next Steps

•Add per-request metadata like user_id, tenant_id, and session_id to every trace.
•Instrument tool calls and agent steps if you move beyond simple query engines.
•Export traces to your existing monitoring stack so LlamaIndex events sit next to app logs and infrastructure metrics.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit