LlamaIndex Tutorial (Python): adding audit logs for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-audit-logs-for-intermediate-developerspython

This tutorial shows you how to add structured audit logs around a LlamaIndex pipeline in Python, so you can trace what the agent retrieved, what it answered, and when each step happened. You need this when you’re building systems for regulated environments, debugging bad answers, or proving which documents influenced a response.

What You'll Need

  • Python 3.10+
  • llama-index
  • openai
  • An OpenAI API key set as OPENAI_API_KEY
  • A small local document corpus for indexing
  • Basic familiarity with LlamaIndex VectorStoreIndex and query engines

Install the packages:

pip install llama-index openai

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by creating a tiny document set and building an index. For auditability, keep the source documents explicit and stable so your logs can reference them later.
from llama_index.core import Document, VectorStoreIndex

docs = [
    Document(
        text="ACME insurance policy claims must be filed within 30 days.",
        metadata={"doc_id": "policy-001", "source": "handbook"}
    ),
    Document(
        text="Fraud alerts are escalated to compliance within 1 business day.",
        metadata={"doc_id": "policy-002", "source": "handbook"}
    ),
]

index = VectorStoreIndex.from_documents(docs)
  1. Add a small audit logger that writes JSON lines to disk. JSONL is easy to grep, ingest into Splunk, or ship to CloudWatch without custom parsing.
import json
from datetime import datetime, timezone
from pathlib import Path

class AuditLogger:
    def __init__(self, path: str = "audit.log"):
        self.path = Path(path)

    def log(self, event_type: str, **payload):
        record = {
            "ts": datetime.now(timezone.utc).isoformat(),
            "event_type": event_type,
            **payload,
        }
        with self.path.open("a", encoding="utf-8") as f:
            f.write(json.dumps(record) + "\n")

audit = AuditLogger()
  1. Wrap the query flow so you log the user question before execution and the final answer after execution. This is the main pattern: treat the LlamaIndex call as a black box and log inputs/outputs around it.
query_engine = index.as_query_engine()

def audited_query(question: str):
    audit.log("query_started", question=question)

    response = query_engine.query(question)

    audit.log(
        "query_finished",
        question=question,
        answer=str(response),
    )
    return response

result = audited_query("What is the claims filing deadline?")
print(result)
  1. Log retrieval details separately using QueryBundle and direct retriever calls. This gives you intermediate visibility into which nodes were fetched before generation.
from llama_index.core.schema import QueryBundle

retriever = index.as_retriever()

def audited_retrieve(question: str):
    audit.log("retrieval_started", question=question)

    bundle = QueryBundle(query_str=question)
    nodes = retriever.retrieve(bundle)

    audit.log(
        "retrieval_finished",
        question=question,
        node_count=len(nodes),
        node_ids=[n.node.node_id for n in nodes],
        scores=[getattr(n, "score", None) for n in nodes],
    )
    return nodes

nodes = audited_retrieve("How fast are fraud alerts escalated?")
for node in nodes:
    print(node.get_content())
  1. If you want deeper traceability, build your own response from retrieved context and log each stage explicitly. This is useful when you need to prove exactly what context was passed into the model.
from llama_index.core.llms import ChatMessage

def audited_rag(question: str):
    audit.log("rag_started", question=question)

    nodes = audited_retrieve(question)
    context = "\n\n".join(n.get_content() for n in nodes)

    prompt = (
        "Answer using only the context below.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}"
    )

    audit.log("prompt_built", question=question, context_chars=len(context))

    response = query_engine._llm.chat([ChatMessage(role="user", content=prompt)])
    answer = response.message.content

    audit.log("rag_finished", question=question, answer=answer)
    return answer

print(audited_rag("What is the fraud escalation timeline?"))

Testing It

Run the script and make two or three different queries. You should see normal answers printed to stdout and matching JSON records appended to audit.log.

Open the log file and confirm that every request has a query_started event followed by a query_finished event, plus retrieval events if you used the retriever wrapper. The timestamps should be UTC ISO 8601 strings, which makes correlation with app logs straightforward.

If you’re using this in production, check that sensitive data is not being written verbatim into logs unless your policy allows it. A common pattern is to hash user IDs and truncate long prompts before writing them.

Next Steps

  • Add request IDs and session IDs so you can correlate multiple tool calls in one trace.
  • Replace file logging with structured logging to stdout or a centralized sink like Datadog or OpenTelemetry.
  • Add redaction rules for PII before writing any prompt or retrieved content to disk.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides