How to Build a compliance checking Agent Using LlamaIndex in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
compliance-checkingllamaindexpythonretail-banking

A compliance checking agent for retail banking reads customer-facing content, product scripts, emails, chat transcripts, or policy drafts and flags whether they violate internal policy or regulatory rules. It matters because small wording mistakes can create mis-selling risk, disclosure failures, audit findings, and expensive remediation across branches, call centers, and digital channels.

Architecture

  • Policy corpus

    • Store AML/KYC policy docs, product T&Cs, fair lending rules, complaint handling procedures, and approved disclosures.
    • Keep versions tagged by effective date so the agent can answer against the right policy set.
  • Document ingestion layer

    • Parse PDFs, DOCX, HTML, and internal wiki pages into Document objects.
    • Preserve metadata like source, jurisdiction, effective_date, document_type, and owner.
  • Indexing and retrieval

    • Use VectorStoreIndex for semantic retrieval over policy text.
    • Use metadata filters so a UK retail banking script is checked against UK policy only.
  • Compliance reasoning layer

    • Use an LLM through LlamaIndex query engines to compare the user’s text against retrieved policy chunks.
    • Force structured output: violation status, rule references, severity, and recommended remediation.
  • Audit logging

    • Persist every input, retrieved evidence, model response, and final decision.
    • This is non-negotiable for retail banking audits and model risk reviews.
  • Guardrails

    • Add deterministic checks before the LLM: prohibited phrases, missing disclosures, unsupported claims.
    • Use the LLM for interpretation; use rules for hard failures.

Implementation

1) Install dependencies and load your policy corpus

Use LlamaIndex’s core APIs to ingest policy files. In production, your documents should come from a controlled repository with versioning and approval workflow.

from pathlib import Path
from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

def load_policy_documents(policy_dir: str):
    docs = []
    for file_path in Path(policy_dir).glob("*.txt"):
        text = file_path.read_text(encoding="utf-8")
        docs.append(
            Document(
                text=text,
                metadata={
                    "source": file_path.name,
                    "jurisdiction": "UK",
                    "document_type": "retail_banking_policy",
                    "effective_date": "2026-01-01",
                },
            )
        )
    return docs

policy_docs = load_policy_documents("./policies")
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
index = VectorStoreIndex.from_documents(policy_docs, transformations=[splitter])

2) Build a retriever that only searches relevant banking policies

For retail banking, you do not want a mortgage script checked against credit card disclosure rules unless that is intentional. Metadata filtering keeps the retrieval scope tight.

from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

retriever = index.as_retriever(
    similarity_top_k=5,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="jurisdiction", value="UK"),
        ExactMatchFilter(key="document_type", value="retail_banking_policy"),
    ]),
)

3) Create a compliance checker query engine with structured output

The pattern here is: retrieve policy evidence first, then ask the LLM to judge the user text against that evidence. The response should be machine-readable so downstream systems can route it to review queues or block publishing.

from pydantic import BaseModel, Field
from typing import List
from llama_index.core import PromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import RetrieverQueryEngine

class ComplianceFinding(BaseModel):
    compliant: bool = Field(description="True if no material issue found")
    severity: str = Field(description="low|medium|high")
    rule_references: List[str] = Field(description="Policy sections cited")
    issues: List[str] = Field(description="Specific violations or concerns")
    remediation: List[str] = Field(description="Concrete fixes")

llm = OpenAI(model="gpt-4o-mini", temperature=0)

prompt = PromptTemplate(
    """You are a retail banking compliance checker.
Review the customer-facing text against the retrieved policy context.

Return only valid JSON with these fields:
compliant (bool), severity (string), rule_references (array of strings),
issues (array of strings), remediation (array of strings).

Policy context:
{context_str}

Text to check:
{query_str}
"""
)

query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    llm=llm,
    text_qa_template=prompt,
)

result = query_engine.query(
    "We can guarantee this credit card will approve you instantly with no checks."
)

print(result.response)
print(result.source_nodes[0].node.metadata)

If you want stronger control over output parsing in your application layer, wrap the response string with ComplianceFinding.model_validate_json(...) after enforcing JSON-only responses in the prompt.

4) Add deterministic pre-checks before calling the LLM

Retail banking compliance work should not rely on probabilistic judgment alone. Catch obvious failures early so you reduce latency and avoid unnecessary model calls.

PROHIBITED_PHRASES = [
    "guarantee approval",
    "no checks",
    "risk-free",
]

def hard_fail_checks(text: str):
    lowered = text.lower()
    hits = [phrase for phrase in PROHIBITED_PHRASES if phrase in lowered]
    return hits

def check_content(text: str):
    hits = hard_fail_checks(text)
    if hits:
        return {
            "compliant": False,
            "severity": "high",
            "rule_references": ["Internal marketing standards"],
            "issues": [f"Prohibited phrase found: {h}" for h in hits],
            "remediation": ["Remove absolute approval claims and unsupported guarantees."],
        }

    response = query_engine.query(text)
    return response.response

print(check_content("We can guarantee this credit card will approve you instantly with no checks."))

Production Considerations

  • Deployment

    • Keep policy indexes isolated by jurisdiction and business line.
    • If data residency matters, run embeddings storage and vector search inside the required region; do not send sensitive customer content across borders without legal approval.
  • Monitoring

    • Log retrieved sources, prompt version, model version, latency, and final decision.
    • Track false positives by channel: branch scripts behave differently from email campaigns or chatbot responses.
  • Guardrails

    • Block high-risk outputs when required disclosures are missing.
    • Add allowlists for approved product names and claims; retail banking teams often need exact wording for APRs, fees, eligibility criteria, and complaints language.
  • Human review

    • Route medium/high severity findings to compliance analysts before publishing.
    • Store analyst overrides as labeled data for future evaluation and prompt tuning.

Common Pitfalls

  • Using one global index for every jurisdiction

    • This causes cross-contamination between UK, EU, US state-level rules, and local bank policies.
    • Fix it by partitioning indexes per jurisdiction and applying metadata filters at retrieval time.
  • Treating the LLM as the source of truth

    • The model should interpret policies; it should not invent them.
    • Fix it by grounding every answer in retrieved policy chunks and logging citations from source_nodes.
  • Skipping auditability

    • If you cannot show what text was checked against which policy version at what time, you do not have a bank-grade system.
    • Fix it by persisting input text hashes, retrieved node IDs, prompt templates, model identifiers, timestamps, and reviewer decisions.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides