How to Build a policy Q&A Agent Using LlamaIndex in Python for fintech
A policy Q&A agent answers employee or customer questions against internal policy documents, then returns a grounded response with citations. In fintech, that matters because policy drift, inconsistent answers, and missing audit trails create compliance risk fast.
Architecture
- •
Document ingestion layer
- •Pulls policy PDFs, Markdown, DOCX, or HTML from approved storage.
- •Enforces data residency by keeping ingestion and indexing inside the required region.
- •
Text extraction and chunking
- •Uses LlamaIndex loaders and splitters to turn policy docs into retrievable chunks.
- •Keeps chunks small enough for precise retrieval, but large enough to preserve policy context.
- •
Vector index
- •Stores embeddings for semantic retrieval over policies, procedures, and controls.
- •Use
VectorStoreIndexfor the main Q&A path.
- •
Retriever + response synthesizer
- •Fetches the most relevant chunks and generates an answer with citations.
- •The answer should be constrained to source material only.
- •
Audit and logging layer
- •Records question, retrieved document IDs, answer, timestamps, and user identity.
- •Required for compliance review and incident investigation.
- •
Guardrails / policy filter
- •Blocks unsupported questions like legal advice or requests to override policy.
- •Routes sensitive cases to a human reviewer.
Implementation
1) Install and load your policy documents
Use a controlled document source. For fintech, that usually means an internal S3 bucket, SharePoint export, or a locked-down file share.
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai pypdf
from llama_index.core import SimpleDirectoryReader
# Example: local directory synced from approved internal storage
documents = SimpleDirectoryReader(
input_dir="./policy_docs",
recursive=True
).load_data()
print(f"Loaded {len(documents)} documents")
2) Build the index with LlamaIndex
This is the core pattern: embed your policies into a vector index and keep the response grounded in retrieved context.
import os
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
similarity_top_k=3,
response_mode="compact"
)
question = "Can we share customer statements over email?"
response = query_engine.query(question)
print(response)
A few details matter here:
- •
temperature=0keeps answers stable for compliance use cases. - •
similarity_top_k=3reduces noise while still giving enough context. - •
response_mode="compact"keeps synthesis tight and easier to audit.
3) Add citations and inspect retrieved sources
For fintech workflows, you need to know exactly which policy chunks supported the answer. LlamaIndex exposes source nodes on the response object.
response = query_engine.query("What is the retention period for KYC records?")
print("Answer:")
print(response.response)
print("\nSources:")
for idx, node in enumerate(response.source_nodes, start=1):
print(f"{idx}. Score: {node.score:.4f}")
print(f" File: {node.node.metadata.get('file_name')}")
print(f" Text: {node.node.get_content()[:250]}")
If your internal reviewers need stronger traceability, persist metadata during ingestion:
from llama_index.core.schema import Document
docs = [
Document(
text=d.text,
metadata={
"file_name": d.metadata.get("file_name"),
"department": "compliance",
"jurisdiction": "UK"
}
)
for d in documents
]
index = VectorStoreIndex.from_documents(docs)
4) Wrap it in a simple service layer with guardrails
Do not expose raw retrieval directly to users. Put a thin service in front of the query engine so you can reject unsafe prompts and log every request.
from datetime import datetime
BLOCKLIST = [
"ignore policy",
"override compliance",
"legal advice",
"bypass controls"
]
def is_allowed(question: str) -> bool:
q = question.lower()
return not any(term in q for term in BLOCKLIST)
def answer_policy_question(question: str, user_id: str):
if not is_allowed(question):
return {
"answer": "This request requires compliance review.",
"status": "blocked",
"user_id": user_id,
"timestamp": datetime.utcnow().isoformat()
}
result = query_engine.query(question)
return {
"answer": str(result.response),
"status": "ok",
"user_id": user_id,
"timestamp": datetime.utcnow().isoformat(),
"sources": [
{
"score": sn.score,
"file_name": sn.node.metadata.get("file_name"),
}
for sn in result.source_nodes
]
}
print(answer_policy_question(
"What is our procedure for customer complaint escalation?",
user_id="u12345"
))
That pattern gives you a clean seam for:
- •authz checks
- •PII redaction before logging
- •jurisdiction-based routing
- •human escalation when confidence is low
Production Considerations
- •
Deployment
- •Keep embeddings, vector store, and inference inside approved cloud regions.
- •For regulated workloads, separate dev/test/prod indexes so test data never contaminates production retrieval.
- •
Monitoring
- •Track retrieval quality: top-k hit rate, empty-result rate, and source overlap across repeated questions.
- •Log question text carefully; redact account numbers, SSNs, card PANs, and other regulated identifiers before storage.
- •
Guardrails
- •Add confidence thresholds. If retrieval scores are weak or sources conflict, return “needs review” instead of guessing.
- •Block requests asking for legal interpretation beyond published policy. Fintech policies often sit close to legal language; don’t let the model improvise.
- •
Auditability
- •Store question, answer, source document IDs, timestamps, model version, and embedding version.
- •If regulators ask why an answer was given six months later, you need reproducible traces.
Common Pitfalls
- •
Using generic web-style RAG without access controls
- •Mistake: indexing everything together across departments or jurisdictions.
- •Fix: partition indexes by business unit, region, or sensitivity level. A UK payments policy should not retrieve US card operations content unless that’s explicitly allowed.
- •
Letting the model answer without citations
- •Mistake: returning fluent answers with no source trail.
- •Fix: always surface
source_nodesor equivalent citation metadata. If there are no good sources, fail closed and escalate.
- •
Ignoring document lifecycle
- •Mistake: old policies stay indexed after being superseded.
- •Fix: version documents at ingestion time and deprecate retired content. In fintech, stale retention or KYC guidance is a real compliance issue.
- •
Skipping prompt boundaries
- •Mistake: asking the LLM to “answer as helpfully as possible.”
- •Fix: constrain it to “answer only from retrieved policy text.” Keep temperature low and reject unsupported questions before generation starts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit