How to Build a compliance checking Agent Using LlamaIndex in Python for retail banking
A compliance checking agent for retail banking reads customer-facing content, product scripts, emails, chat transcripts, or policy drafts and flags whether they violate internal policy or regulatory rules. It matters because small wording mistakes can create mis-selling risk, disclosure failures, audit findings, and expensive remediation across branches, call centers, and digital channels.
Architecture
- •
Policy corpus
- •Store AML/KYC policy docs, product T&Cs, fair lending rules, complaint handling procedures, and approved disclosures.
- •Keep versions tagged by effective date so the agent can answer against the right policy set.
- •
Document ingestion layer
- •Parse PDFs, DOCX, HTML, and internal wiki pages into
Documentobjects. - •Preserve metadata like
source,jurisdiction,effective_date,document_type, andowner.
- •Parse PDFs, DOCX, HTML, and internal wiki pages into
- •
Indexing and retrieval
- •Use
VectorStoreIndexfor semantic retrieval over policy text. - •Use metadata filters so a UK retail banking script is checked against UK policy only.
- •Use
- •
Compliance reasoning layer
- •Use an LLM through LlamaIndex query engines to compare the user’s text against retrieved policy chunks.
- •Force structured output: violation status, rule references, severity, and recommended remediation.
- •
Audit logging
- •Persist every input, retrieved evidence, model response, and final decision.
- •This is non-negotiable for retail banking audits and model risk reviews.
- •
Guardrails
- •Add deterministic checks before the LLM: prohibited phrases, missing disclosures, unsupported claims.
- •Use the LLM for interpretation; use rules for hard failures.
Implementation
1) Install dependencies and load your policy corpus
Use LlamaIndex’s core APIs to ingest policy files. In production, your documents should come from a controlled repository with versioning and approval workflow.
from pathlib import Path
from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
def load_policy_documents(policy_dir: str):
docs = []
for file_path in Path(policy_dir).glob("*.txt"):
text = file_path.read_text(encoding="utf-8")
docs.append(
Document(
text=text,
metadata={
"source": file_path.name,
"jurisdiction": "UK",
"document_type": "retail_banking_policy",
"effective_date": "2026-01-01",
},
)
)
return docs
policy_docs = load_policy_documents("./policies")
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
index = VectorStoreIndex.from_documents(policy_docs, transformations=[splitter])
2) Build a retriever that only searches relevant banking policies
For retail banking, you do not want a mortgage script checked against credit card disclosure rules unless that is intentional. Metadata filtering keeps the retrieval scope tight.
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
retriever = index.as_retriever(
similarity_top_k=5,
filters=MetadataFilters(filters=[
ExactMatchFilter(key="jurisdiction", value="UK"),
ExactMatchFilter(key="document_type", value="retail_banking_policy"),
]),
)
3) Create a compliance checker query engine with structured output
The pattern here is: retrieve policy evidence first, then ask the LLM to judge the user text against that evidence. The response should be machine-readable so downstream systems can route it to review queues or block publishing.
from pydantic import BaseModel, Field
from typing import List
from llama_index.core import PromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import RetrieverQueryEngine
class ComplianceFinding(BaseModel):
compliant: bool = Field(description="True if no material issue found")
severity: str = Field(description="low|medium|high")
rule_references: List[str] = Field(description="Policy sections cited")
issues: List[str] = Field(description="Specific violations or concerns")
remediation: List[str] = Field(description="Concrete fixes")
llm = OpenAI(model="gpt-4o-mini", temperature=0)
prompt = PromptTemplate(
"""You are a retail banking compliance checker.
Review the customer-facing text against the retrieved policy context.
Return only valid JSON with these fields:
compliant (bool), severity (string), rule_references (array of strings),
issues (array of strings), remediation (array of strings).
Policy context:
{context_str}
Text to check:
{query_str}
"""
)
query_engine = RetrieverQueryEngine.from_args(
retriever=retriever,
llm=llm,
text_qa_template=prompt,
)
result = query_engine.query(
"We can guarantee this credit card will approve you instantly with no checks."
)
print(result.response)
print(result.source_nodes[0].node.metadata)
If you want stronger control over output parsing in your application layer, wrap the response string with ComplianceFinding.model_validate_json(...) after enforcing JSON-only responses in the prompt.
4) Add deterministic pre-checks before calling the LLM
Retail banking compliance work should not rely on probabilistic judgment alone. Catch obvious failures early so you reduce latency and avoid unnecessary model calls.
PROHIBITED_PHRASES = [
"guarantee approval",
"no checks",
"risk-free",
]
def hard_fail_checks(text: str):
lowered = text.lower()
hits = [phrase for phrase in PROHIBITED_PHRASES if phrase in lowered]
return hits
def check_content(text: str):
hits = hard_fail_checks(text)
if hits:
return {
"compliant": False,
"severity": "high",
"rule_references": ["Internal marketing standards"],
"issues": [f"Prohibited phrase found: {h}" for h in hits],
"remediation": ["Remove absolute approval claims and unsupported guarantees."],
}
response = query_engine.query(text)
return response.response
print(check_content("We can guarantee this credit card will approve you instantly with no checks."))
Production Considerations
- •
Deployment
- •Keep policy indexes isolated by jurisdiction and business line.
- •If data residency matters, run embeddings storage and vector search inside the required region; do not send sensitive customer content across borders without legal approval.
- •
Monitoring
- •Log retrieved sources, prompt version, model version, latency, and final decision.
- •Track false positives by channel: branch scripts behave differently from email campaigns or chatbot responses.
- •
Guardrails
- •Block high-risk outputs when required disclosures are missing.
- •Add allowlists for approved product names and claims; retail banking teams often need exact wording for APRs, fees, eligibility criteria, and complaints language.
- •
Human review
- •Route medium/high severity findings to compliance analysts before publishing.
- •Store analyst overrides as labeled data for future evaluation and prompt tuning.
Common Pitfalls
- •
Using one global index for every jurisdiction
- •This causes cross-contamination between UK, EU, US state-level rules, and local bank policies.
- •Fix it by partitioning indexes per jurisdiction and applying metadata filters at retrieval time.
- •
Treating the LLM as the source of truth
- •The model should interpret policies; it should not invent them.
- •Fix it by grounding every answer in retrieved policy chunks and logging citations from
source_nodes.
- •
Skipping auditability
- •If you cannot show what text was checked against which policy version at what time, you do not have a bank-grade system.
- •Fix it by persisting input text hashes, retrieved node IDs, prompt templates, model identifiers, timestamps, and reviewer decisions.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit