How to Build a policy Q&A Agent Using LlamaIndex in Python for investment banking
A policy Q&A agent for investment banking answers questions like “Can I share this pitch deck externally?”, “What’s the retention rule for trade communications?”, or “Is this counterparty list restricted?” It matters because bankers need fast answers, but the answers must be grounded in approved policy, versioned, auditable, and safe to use in regulated workflows.
Architecture
- •
Policy corpus ingestion
- •Pull PDFs, DOCX, HTML, and internal wiki pages from approved repositories.
- •Normalize them into text and preserve source metadata like document title, version, owner, effective date, and jurisdiction.
- •
Index layer
- •Use a
VectorStoreIndexfor semantic retrieval over policy chunks. - •Keep chunk metadata attached so the agent can cite the exact policy section.
- •Use a
- •
Retriever + response synthesizer
- •Use LlamaIndex retrieval to fetch only relevant chunks.
- •Force answer generation to stay grounded in retrieved context.
- •
Audit logging
- •Store the user question, retrieved document IDs, answer text, and timestamp.
- •This is non-negotiable for compliance review and post-incident analysis.
- •
Guardrails
- •Reject questions that ask for legal advice outside policy scope.
- •Detect missing context and return “I don’t know” instead of hallucinating.
- •
Access control / residency
- •Restrict indexes by business unit or region.
- •Keep embeddings and source documents in approved infrastructure if data residency applies.
Implementation
1) Load policy documents with metadata
Start with a small set of approved policies. In banking, metadata is not optional; you need document provenance for every answer.
from pathlib import Path
from llama_index.core import SimpleDirectoryReader
docs = SimpleDirectoryReader(
input_dir="./policies",
recursive=True,
required_exts=[".pdf", ".txt", ".md"],
).load_data()
for doc in docs:
doc.metadata.update({
"source_system": "policy_repo",
"business_line": "investment_banking",
"jurisdiction": "US",
"approved_for_use": True,
})
print(f"Loaded {len(docs)} documents")
2) Build a vector index over the policy corpus
Use VectorStoreIndex for semantic search. For production you would back this with a managed vector store; for local development you can keep it simple.
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(
docs,
show_progress=True,
)
index.storage_context.persist(persist_dir="./storage/policy_index")
This gives you a persistent index you can reload later. In a real deployment, persist the storage context in an environment that matches your data residency requirements.
3) Create a constrained Q&A engine
Use as_query_engine() with a strict prompt style: answer only from retrieved policy content, cite sources, and refuse unsupported claims.
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core.prompts import PromptTemplate
storage_context = StorageContext.from_defaults(persist_dir="./storage/policy_index")
loaded_index = load_index_from_storage(storage_context)
qa_prompt = PromptTemplate(
"""You are a policy assistant for investment banking.
Answer using only the provided context.
If the context does not contain the answer, say: "I don't know based on the current policy set."
Context:
{context_str}
Question: {query_str}
Answer with:
- direct answer
- cited source snippets
"""
)
query_engine = loaded_index.as_query_engine(
similarity_top_k=4,
text_qa_template=qa_prompt,
)
response = query_engine.query(
"Can I send a draft pitch book to an external client before compliance review?"
)
print(response)
The important part is not just retrieval. It is forcing the model to stay inside policy boundaries. If your policies are split by desk or geography, build separate indexes or filters per scope instead of one global blob.
4) Add an audit trail around every query
For banking use cases, log what was asked and what evidence was used. That gives compliance teams something they can actually review.
import json
from datetime import datetime
def ask_policy(question: str):
result = query_engine.query(question)
audit_event = {
"timestamp_utc": datetime.utcnow().isoformat(),
"question": question,
"answer": str(result),
"retrieved_sources": [
{
"doc_id": node.node.metadata.get("file_name"),
"score": node.score,
"jurisdiction": node.node.metadata.get("jurisdiction"),
}
for node in result.source_nodes
],
}
with open("./audit_log.jsonl", "a") as f:
f.write(json.dumps(audit_event) + "\n")
return result
print(ask_policy("What is the retention period for chat messages used in deal discussions?"))
If you need stricter control, wrap ask_policy() behind authentication and authorization checks before it ever hits the retriever.
Production Considerations
- •
Deployment
- •Separate indexes by region or business line when policies differ materially.
- •Keep source docs and embeddings in approved cloud regions if your firm has residency constraints.
- •Pin package versions; LlamaIndex changes quickly and you do not want silent behavior drift in a regulated tool.
- •
Monitoring
- •Track retrieval hit rate, unanswered questions, and top failing queries.
- •Log source coverage so you can see when users ask about policies that are missing or stale.
- •Review low-confidence answers manually before exposing them broadly.
- •
Guardrails
- •Block prompts requesting confidential deal info, MNPI handling advice outside published policy, or anything that looks like legal interpretation beyond internal guidance.
- •Return “not found” instead of generating an answer from general model knowledge.
- •Add role-based filters so analysts do not see policies reserved for compliance or legal teams.
- •
Change management
- •Version policies explicitly and reindex on approval events.
- •Keep old versions searchable for audit reconstruction if your retention rules require it.
- •Record which policy version was active when an answer was generated.
Common Pitfalls
- •
Using raw PDFs without metadata
- •If you strip titles, dates, owners, or jurisdiction tags, auditability gets weak fast.
- •Fix it by attaching metadata during ingestion and preserving it through indexing.
- •
Letting the model answer from memory
- •A generic LLM will confidently invent policy details if retrieval is weak.
- •Fix it by using grounded prompts and refusing answers when supporting context is missing.
- •
One index for everything
- •Mixing global policies, regional rules, HR docs, and trading procedures creates bad retrieval and bad answers.
- •Fix it by partitioning indexes by audience, jurisdiction, or document class.
A good banking policy agent is boring in the right way: deterministic enough to trust, logged enough to audit, and narrow enough to stay inside compliance boundaries. Build it that way from day one.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit