LlamaIndex Tutorial (Python): adding audit logs for intermediate developers
This tutorial shows you how to add structured audit logs around a LlamaIndex pipeline in Python, so you can trace what the agent retrieved, what it answered, and when each step happened. You need this when you’re building systems for regulated environments, debugging bad answers, or proving which documents influenced a response.
What You'll Need
- •Python 3.10+
- •
llama-index - •
openai - •An OpenAI API key set as
OPENAI_API_KEY - •A small local document corpus for indexing
- •Basic familiarity with LlamaIndex
VectorStoreIndexand query engines
Install the packages:
pip install llama-index openai
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by creating a tiny document set and building an index. For auditability, keep the source documents explicit and stable so your logs can reference them later.
from llama_index.core import Document, VectorStoreIndex
docs = [
Document(
text="ACME insurance policy claims must be filed within 30 days.",
metadata={"doc_id": "policy-001", "source": "handbook"}
),
Document(
text="Fraud alerts are escalated to compliance within 1 business day.",
metadata={"doc_id": "policy-002", "source": "handbook"}
),
]
index = VectorStoreIndex.from_documents(docs)
- •Add a small audit logger that writes JSON lines to disk. JSONL is easy to grep, ingest into Splunk, or ship to CloudWatch without custom parsing.
import json
from datetime import datetime, timezone
from pathlib import Path
class AuditLogger:
def __init__(self, path: str = "audit.log"):
self.path = Path(path)
def log(self, event_type: str, **payload):
record = {
"ts": datetime.now(timezone.utc).isoformat(),
"event_type": event_type,
**payload,
}
with self.path.open("a", encoding="utf-8") as f:
f.write(json.dumps(record) + "\n")
audit = AuditLogger()
- •Wrap the query flow so you log the user question before execution and the final answer after execution. This is the main pattern: treat the LlamaIndex call as a black box and log inputs/outputs around it.
query_engine = index.as_query_engine()
def audited_query(question: str):
audit.log("query_started", question=question)
response = query_engine.query(question)
audit.log(
"query_finished",
question=question,
answer=str(response),
)
return response
result = audited_query("What is the claims filing deadline?")
print(result)
- •Log retrieval details separately using
QueryBundleand direct retriever calls. This gives you intermediate visibility into which nodes were fetched before generation.
from llama_index.core.schema import QueryBundle
retriever = index.as_retriever()
def audited_retrieve(question: str):
audit.log("retrieval_started", question=question)
bundle = QueryBundle(query_str=question)
nodes = retriever.retrieve(bundle)
audit.log(
"retrieval_finished",
question=question,
node_count=len(nodes),
node_ids=[n.node.node_id for n in nodes],
scores=[getattr(n, "score", None) for n in nodes],
)
return nodes
nodes = audited_retrieve("How fast are fraud alerts escalated?")
for node in nodes:
print(node.get_content())
- •If you want deeper traceability, build your own response from retrieved context and log each stage explicitly. This is useful when you need to prove exactly what context was passed into the model.
from llama_index.core.llms import ChatMessage
def audited_rag(question: str):
audit.log("rag_started", question=question)
nodes = audited_retrieve(question)
context = "\n\n".join(n.get_content() for n in nodes)
prompt = (
"Answer using only the context below.\n\n"
f"Context:\n{context}\n\n"
f"Question: {question}"
)
audit.log("prompt_built", question=question, context_chars=len(context))
response = query_engine._llm.chat([ChatMessage(role="user", content=prompt)])
answer = response.message.content
audit.log("rag_finished", question=question, answer=answer)
return answer
print(audited_rag("What is the fraud escalation timeline?"))
Testing It
Run the script and make two or three different queries. You should see normal answers printed to stdout and matching JSON records appended to audit.log.
Open the log file and confirm that every request has a query_started event followed by a query_finished event, plus retrieval events if you used the retriever wrapper. The timestamps should be UTC ISO 8601 strings, which makes correlation with app logs straightforward.
If you’re using this in production, check that sensitive data is not being written verbatim into logs unless your policy allows it. A common pattern is to hash user IDs and truncate long prompts before writing them.
Next Steps
- •Add request IDs and session IDs so you can correlate multiple tool calls in one trace.
- •Replace file logging with structured logging to stdout or a centralized sink like Datadog or OpenTelemetry.
- •Add redaction rules for PII before writing any prompt or retrieved content to disk.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit