How to Build a compliance checking Agent Using LlamaIndex in Python for banking

By Cyprian AaronsUpdated 2026-04-21

compliance-checkingllamaindexpythonbanking

A compliance checking agent for banking takes a policy question, searches the right internal sources, and returns a grounded answer with citations. It matters because the cost of a bad answer is not just a bug; it can become a regulatory issue, an audit finding, or a customer harm event.

Architecture

•
Policy corpus ingestion
- •Load KYC, AML, sanctions, product policy, and procedure documents into LlamaIndex.
- •Keep source metadata like document version, jurisdiction, owner, and effective date.
•
Retrieval layer
- •Use VectorStoreIndex with metadata filtering so the agent only sees approved documents for the relevant region or business line.
- •For banking, this is where data residency constraints start to matter.
•
Query engine
- •Use RetrieverQueryEngine to answer compliance questions with citations.
- •This keeps responses grounded in retrieved policy text instead of model memory.
•
Response synthesis
- •Use LlamaIndex’s response synthesizers to produce concise answers plus evidence.
- •For compliance workflows, prefer extractive or citation-heavy answers over free-form prose.
•
Guardrail layer
- •Add rules for prohibited outputs: no legal advice framing, no unsupported approval language, no customer-sensitive data leakage.
- •Log every prompt, retrieval result, and final answer for auditability.
•
Human review workflow
- •Escalate low-confidence cases to compliance analysts.
- •The agent should recommend next steps, not make final regulatory decisions.

Implementation

•Install dependencies and prepare your document set

Use LlamaIndex’s core package and a local embedding model if you need tighter control over residency. In regulated environments, avoid sending policy documents to external services unless your legal and security teams have approved it.

pip install llama-index llama-index-embeddings-huggingface sentence-transformers

•Load compliance documents with metadata

The key pattern is to attach metadata that you can filter on later. For banking, include jurisdiction and document version so the retriever does not mix US retail policy with EU commercial policy.

from llama_index.core import Document

docs = [
    Document(
        text="""
        Customer onboarding requires identity verification before account activation.
        Enhanced due diligence is required for high-risk customers and PEPs.
        """,
        metadata={
            "doc_type": "kyc_policy",
            "jurisdiction": "US",
            "version": "2024.10",
            "business_unit": "retail_banking",
        },
    ),
    Document(
        text="""
        Transactions above threshold values must be screened against sanctions lists.
        Alerts require analyst review before closure.
        """,
        metadata={
            "doc_type": "aml_policy",
            "jurisdiction": "US",
            "version": "2024.09",
            "business_unit": "retail_banking",
        },
    ),
]

•Build the index and query engine

This example uses VectorStoreIndex, MetadataFilters, ExactMatchFilter, and RetrieverQueryEngine. That gives you controlled retrieval plus cited answers from the indexed policies.

from llama_index.core import VectorStoreIndex
from llama_index.core.indices.vector_store import MetadataFilters, ExactMatchFilter

# If you want local embeddings for residency control:
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

index = VectorStoreIndex.from_documents(docs)

filters = MetadataFilters(filters=[
    ExactMatchFilter(key="jurisdiction", value="US"),
    ExactMatchFilter(key="business_unit", value="retail_banking"),
])

retriever = index.as_retriever(
    similarity_top_k=3,
    filters=filters,
)

query_engine = index.as_query_engine(
    similarity_top_k=3,
    filters=filters,
)

question = "Do we need enhanced due diligence for a high-risk customer?"
response = query_engine.query(question)

print(str(response))
for source in response.source_nodes:
    print(source.node.metadata)

•Wrap the query engine with an explicit compliance decision format

For production banking workflows, don’t return just prose. Return structured output that downstream systems can route into case management or analyst review.

import json

def check_compliance(question: str) -> dict:
    response = query_engine.query(question)

    evidence = []
    for sn in response.source_nodes:
        evidence.append({
            "text": sn.node.get_text()[:300],
            "metadata": sn.node.metadata,
            "score": sn.score,
        })

    return {
        "question": question,
        "answer": str(response),
        "evidence": evidence,
        "decision": "needs_review" if len(evidence) == 0 else "grounded_response",
    }

result = check_compliance("Can we onboard a customer without identity verification?")
print(json.dumps(result, indent=2))

Production Considerations

•
Audit logging
- •Store the user question, retrieved node IDs, document versions, final answer, and timestamp.
- •Regulators care about traceability: which policy was used and why the agent said what it said.
•
Data residency and access control
- •Keep embeddings and vector stores in-region if policy data cannot leave a jurisdiction.
- •Enforce RBAC so analysts in one business unit cannot query another unit’s restricted policies.
•
Guardrails on output
- •Block unsupported statements like “approved,” “cleared,” or “fully compliant” unless your workflow explicitly allows that decision state.
- •Force uncertainty handling: if retrieval confidence is weak or sources conflict, route to human review.
•
Monitoring
- •Track retrieval hit rate, citation coverage, escalation rate, and false positive/false negative outcomes.
- •In banking, monitor drift when policies change; stale embeddings are a real operational risk.

Common Pitfalls

•
Using generic RAG without metadata filters
- •Mistake: mixing jurisdictions or product lines in retrieval.
- •Fix: filter by jurisdiction, business_unit, doc_type, and version every time.
•
Treating the model as the decision-maker
- •Mistake: letting the LLM output final compliance judgments.
- •Fix: make it an evidence-backed assistant that recommends review or points to policy text.
•
Ignoring document freshness
- •Mistake: indexing outdated procedures and assuming answers are current.
- •Fix: version your documents, reindex on policy updates, and expire old content from retrieval.
•
Not preserving provenance
- •Mistake: returning answers without source references.
- •Fix: always surface response.source_nodes so auditors can trace the reasoning path.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit