LlamaIndex Tutorial (Python): adding observability for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexadding-observability-for-intermediate-developerspython

This tutorial shows you how to wire observability into a LlamaIndex Python app so you can inspect retrieval, prompts, tool calls, and LLM latency instead of guessing why an answer looks wrong. You need this when your prototype starts failing in ways logs can’t explain, especially once multiple nodes, retrievers, and models are involved.

What You'll Need

•Python 3.10+
•A working LlamaIndex install
•An OpenAI API key
•Optional: a Phoenix account if you want richer traces in a UI
•Basic familiarity with VectorStoreIndex, QueryEngine, and LlamaIndex callbacks

Install the packages first:

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai arize-phoenix openinference-instrumentation-llama-index

Set your API key:

export OPENAI_API_KEY="your-openai-key"

Step-by-Step

•Start with a small index so you can see every stage clearly. The point here is not scale; it is making the chain of events observable end to end.

from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

docs = [
    Document(text="LlamaIndex is used to build retrieval-augmented generation apps."),
    Document(text="Observability helps debug retrieval quality, prompt construction, and latency."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("Why do I need observability?")
print(response)

•Add callback-based tracing so LlamaIndex emits structured events for each query. This gives you a local view of what happened without changing your app logic.

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler(print_trace_on_end=True)
Settings.callback_manager = CallbackManager([debug_handler])

query_engine = index.as_query_engine()
response = query_engine.query("What does observability help debug?")
print("\nFinal response:\n", response)

•Instrument the app with Phoenix so traces are exported into a proper observability backend. This is the step that turns internal callback events into something you can inspect across requests.

import phoenix as px
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

session = px.launch_app()

LlamaIndexInstrumentor().instrument()

query_engine = index.as_query_engine()
response = query_engine.query("Explain why retrieval quality matters.")
print(response)

•If you want richer trace metadata, keep the callback handler and Phoenix instrumentation together. That way you get both console output for quick debugging and trace storage for deeper inspection.

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler(print_trace_on_end=False)
Settings.callback_manager = CallbackManager([debug_handler])

query_engine = index.as_query_engine(similarity_top_k=2)

questions = [
    "What is LlamaIndex?",
    "How does observability help with debugging?",
]

for question in questions:
    result = query_engine.query(question)
    print(f"\nQ: {question}\nA: {result}")

•Add one more layer: log the exact input and output around your query boundary. In production, this is where you correlate application logs with traces when users report bad answers.

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("rag-app")

question = "Why do developers add observability?"
logger.info("query.start question=%s", question)

result = query_engine.query(question)

logger.info("query.end answer=%s", str(result))
print(result)

Testing It

Run the script and make sure you see three things: the final answer printed to stdout, callback trace output from LlamaDebugHandler, and a Phoenix session launching if you enabled it. If the app fails before querying, check that OPENAI_API_KEY is set and that both the embedding model and LLM packages are installed.

For a real test, ask two different questions and compare traces. You should see differences in retrieval behavior when the question changes, which tells you observability is capturing useful execution detail instead of just logging static text.

If Phoenix is running, open the UI and inspect spans for the query call. Look for request timing, retrieved nodes, prompt content, and model outputs; those are the fields that matter when debugging production RAG systems.

Next Steps

•Add span attributes for tenant ID, user ID, or request ID so traces map cleanly back to your app logs.
•Instrument your FastAPI or Flask service around the query boundary so every request gets a trace.
•Move from console tracing to centralized monitoring by exporting Phoenix data into your preferred backend.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit