How to Build a compliance checking Agent Using LlamaIndex in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21

compliance-checkingllamaindexpythonretail-banking

A compliance checking agent for retail banking reads customer-facing content, product scripts, emails, chat transcripts, or policy drafts and flags whether they violate internal policy or regulatory rules. It matters because small wording mistakes can create mis-selling risk, disclosure failures, audit findings, and expensive remediation across branches, call centers, and digital channels.

Architecture

•
Policy corpus
- •Store AML/KYC policy docs, product T&Cs, fair lending rules, complaint handling procedures, and approved disclosures.
- •Keep versions tagged by effective date so the agent can answer against the right policy set.
•
Document ingestion layer
- •Parse PDFs, DOCX, HTML, and internal wiki pages into Document objects.
- •Preserve metadata like source, jurisdiction, effective_date, document_type, and owner.
•
Indexing and retrieval
- •Use VectorStoreIndex for semantic retrieval over policy text.
- •Use metadata filters so a UK retail banking script is checked against UK policy only.
•
Compliance reasoning layer
- •Use an LLM through LlamaIndex query engines to compare the user’s text against retrieved policy chunks.
- •Force structured output: violation status, rule references, severity, and recommended remediation.
•
Audit logging
- •Persist every input, retrieved evidence, model response, and final decision.
- •This is non-negotiable for retail banking audits and model risk reviews.
•
Guardrails
- •Add deterministic checks before the LLM: prohibited phrases, missing disclosures, unsupported claims.
- •Use the LLM for interpretation; use rules for hard failures.

Implementation

1) Install dependencies and load your policy corpus

Use LlamaIndex’s core APIs to ingest policy files. In production, your documents should come from a controlled repository with versioning and approval workflow.

from pathlib import Path
from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

def load_policy_documents(policy_dir: str):
    docs = []
    for file_path in Path(policy_dir).glob("*.txt"):
        text = file_path.read_text(encoding="utf-8")
        docs.append(
            Document(
                text=text,
                metadata={
                    "source": file_path.name,
                    "jurisdiction": "UK",
                    "document_type": "retail_banking_policy",
                    "effective_date": "2026-01-01",
                },
            )
        )
    return docs

policy_docs = load_policy_documents("./policies")
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
index = VectorStoreIndex.from_documents(policy_docs, transformations=[splitter])

2) Build a retriever that only searches relevant banking policies

For retail banking, you do not want a mortgage script checked against credit card disclosure rules unless that is intentional. Metadata filtering keeps the retrieval scope tight.

from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

retriever = index.as_retriever(
    similarity_top_k=5,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="jurisdiction", value="UK"),
        ExactMatchFilter(key="document_type", value="retail_banking_policy"),
    ]),
)

3) Create a compliance checker query engine with structured output

The pattern here is: retrieve policy evidence first, then ask the LLM to judge the user text against that evidence. The response should be machine-readable so downstream systems can route it to review queues or block publishing.

from pydantic import BaseModel, Field
from typing import List
from llama_index.core import PromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import RetrieverQueryEngine

class ComplianceFinding(BaseModel):
    compliant: bool = Field(description="True if no material issue found")
    severity: str = Field(description="low|medium|high")
    rule_references: List[str] = Field(description="Policy sections cited")
    issues: List[str] = Field(description="Specific violations or concerns")
    remediation: List[str] = Field(description="Concrete fixes")

llm = OpenAI(model="gpt-4o-mini", temperature=0)

prompt = PromptTemplate(
    """You are a retail banking compliance checker.
Review the customer-facing text against the retrieved policy context.

Return only valid JSON with these fields:
compliant (bool), severity (string), rule_references (array of strings),
issues (array of strings), remediation (array of strings).

Policy context:
{context_str}

Text to check:
{query_str}
"""
)

query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    llm=llm,
    text_qa_template=prompt,
)

result = query_engine.query(
    "We can guarantee this credit card will approve you instantly with no checks."
)

print(result.response)
print(result.source_nodes[0].node.metadata)

If you want stronger control over output parsing in your application layer, wrap the response string with ComplianceFinding.model_validate_json(...) after enforcing JSON-only responses in the prompt.

4) Add deterministic pre-checks before calling the LLM

Retail banking compliance work should not rely on probabilistic judgment alone. Catch obvious failures early so you reduce latency and avoid unnecessary model calls.

PROHIBITED_PHRASES = [
    "guarantee approval",
    "no checks",
    "risk-free",
]

def hard_fail_checks(text: str):
    lowered = text.lower()
    hits = [phrase for phrase in PROHIBITED_PHRASES if phrase in lowered]
    return hits

def check_content(text: str):
    hits = hard_fail_checks(text)
    if hits:
        return {
            "compliant": False,
            "severity": "high",
            "rule_references": ["Internal marketing standards"],
            "issues": [f"Prohibited phrase found: {h}" for h in hits],
            "remediation": ["Remove absolute approval claims and unsupported guarantees."],
        }

    response = query_engine.query(text)
    return response.response

print(check_content("We can guarantee this credit card will approve you instantly with no checks."))

Production Considerations

•
Deployment
- •Keep policy indexes isolated by jurisdiction and business line.
- •If data residency matters, run embeddings storage and vector search inside the required region; do not send sensitive customer content across borders without legal approval.
•
Monitoring
- •Log retrieved sources, prompt version, model version, latency, and final decision.
- •Track false positives by channel: branch scripts behave differently from email campaigns or chatbot responses.
•
Guardrails
- •Block high-risk outputs when required disclosures are missing.
- •Add allowlists for approved product names and claims; retail banking teams often need exact wording for APRs, fees, eligibility criteria, and complaints language.
•
Human review
- •Route medium/high severity findings to compliance analysts before publishing.
- •Store analyst overrides as labeled data for future evaluation and prompt tuning.

Common Pitfalls

•
Using one global index for every jurisdiction
- •This causes cross-contamination between UK, EU, US state-level rules, and local bank policies.
- •Fix it by partitioning indexes per jurisdiction and applying metadata filters at retrieval time.
•
Treating the LLM as the source of truth
- •The model should interpret policies; it should not invent them.
- •Fix it by grounding every answer in retrieved policy chunks and logging citations from source_nodes.
•
Skipping auditability
- •If you cannot show what text was checked against which policy version at what time, you do not have a bank-grade system.
- •Fix it by persisting input text hashes, retrieved node IDs, prompt templates, model identifiers, timestamps, and reviewer decisions.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit