How to Build a compliance checking Agent Using LlamaIndex in Python for investment banking

By Cyprian AaronsUpdated 2026-04-21
compliance-checkingllamaindexpythoninvestment-banking

A compliance checking agent in investment banking reviews documents, chats, emails, trade-related notes, or deal materials against internal policy and external regulations before they move forward. It matters because one missed restriction, one unapproved statement, or one data residency violation can create regulatory exposure, delayed deals, or a formal audit finding.

Architecture

  • Document ingestion layer

    • Pulls policies, procedures, regulatory guidance, and approved clause libraries from controlled sources.
    • Typical inputs: PDF policy manuals, Word playbooks, SharePoint exports, and internal wiki pages.
  • Indexing layer

    • Converts source material into searchable nodes using LlamaIndex.
    • Uses VectorStoreIndex for semantic retrieval over compliance content.
  • Policy retrieval layer

    • Retrieves the most relevant policy passages for a given message or document.
    • Uses QueryEngine with strict top-k retrieval and source citation support.
  • Compliance decision layer

    • Applies deterministic rules plus LLM reasoning to classify content as pass / review / reject.
    • Produces structured output with reasons and cited policy references.
  • Audit logging layer

    • Stores input text, retrieved policy snippets, model output, timestamps, and reviewer overrides.
    • Required for traceability in regulated workflows.
  • Guardrail layer

    • Enforces redaction, jurisdiction checks, and escalation rules before any response is returned.
    • Prevents the agent from giving advice outside approved policy scope.

Implementation

1. Install dependencies and load your policy corpus

For investment banking use cases, keep your policy corpus controlled and versioned. Do not index random shared drives or user-uploaded files without approval.

pip install llama-index llama-index-llms-openai pypdf
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

# Load approved compliance documents only
documents = SimpleDirectoryReader(
    input_dir="./compliance_docs",
    required_exts=[".pdf", ".txt", ".md"],
).load_data()

splitter = SentenceSplitter(chunk_size=800, chunk_overlap=120)
nodes = splitter.get_nodes_from_documents(documents)

print(f"Loaded {len(documents)} documents into {len(nodes)} nodes")

2. Build the compliance index with citations

This pattern gives you semantic retrieval over policies while preserving source references. That matters when a reviewer asks why the agent flagged a sentence.

from llama_index.core import VectorStoreIndex
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(
    similarity_top_k=4,
    response_mode="compact"
)

question = """
Review this draft email for investment banking compliance issues:
"We can guarantee approval from the regulator and expect the client to close next week."
"""

response = query_engine.query(question)
print(response)

3. Add structured compliance output

In production you want more than free-text feedback. You want a machine-readable decision that downstream systems can route into approve/review/reject workflows.

from pydantic import BaseModel, Field
from typing import List
from llama_index.core.program import LLMTextCompletionProgram

class ComplianceResult(BaseModel):
    decision: str = Field(description="approve, review, or reject")
    risk_level: str = Field(description="low, medium, high")
    reasons: List[str]
    cited_policies: List[str]

prompt_template = """
You are a compliance reviewer for an investment bank.
Use only the provided context to assess the text below.

Context:
{context_str}

Text:
{text}

Return a strict JSON object with:
decision, risk_level, reasons, cited_policies
"""

program = LLMTextCompletionProgram.from_defaults(
    output_cls=ComplianceResult,
    prompt_template_str=prompt_template,
    llm=Settings.llm,
)

result = program(
    context_str=str(response),
    text="We can guarantee approval from the regulator and expect the client to close next week."
)

print(result.model_dump())

4. Wrap it in an audit-friendly service function

Keep the final decision path deterministic enough for audit review. Log the input text hash, retrieved sources, model version, and final outcome.

import hashlib
import json
from datetime import datetime

def check_compliance(text: str):
    query_response = query_engine.query(text)
    verdict = program(context_str=str(query_response), text=text)

    record = {
        "timestamp": datetime.utcnow().isoformat(),
        "input_hash": hashlib.sha256(text.encode("utf-8")).hexdigest(),
        "retrieved_context": str(query_response),
        "verdict": verdict.model_dump(),
        "model": "gpt-4o-mini",
    }

    with open("./audit_log.jsonl", "a", encoding="utf-8") as f:
        f.write(json.dumps(record) + "\n")

    return verdict

print(check_compliance("Please send the pitch deck to the client before legal sign-off."))

Production Considerations

  • Deploy in-region

    • For banking data residency requirements, keep ingestion, vector storage, and inference in approved regions.
    • If your policies or client data cannot leave a jurisdiction, do not route them through unmanaged third-party endpoints.
  • Log everything needed for audit

    • Store prompt version, retrieved nodes, policy document IDs, model name, timestamp, and human override decisions.
    • Auditors care about reproducibility more than clever prompts.
  • Add hard guardrails before generation

    • Block PII leakage, MNPI exposure, sanctions-related terms without escalation flow, and any request involving restricted lists.
    • If the input mentions deal terms tied to blackout periods or wall-crossing controls, escalate instead of answering directly.
  • Use human-in-the-loop thresholds

    • Auto-approve only low-risk cases with high confidence and strong policy matches.
    • Route ambiguous language like “should be fine,” “likely approved,” or “off the record” to compliance reviewers.

Common Pitfalls

  • Using generic web data instead of approved internal policies

    • Compliance agents should answer from governed sources only.
    • Fix it by indexing versioned internal documents and excluding uncontrolled content sources.
  • Returning answers without citations

    • A yes/no answer with no source trail is weak in an audit or review meeting.
    • Fix it by requiring retrieved policy references in every result payload.
  • Letting the LLM make final decisions on high-risk cases

    • Investment banking workflows need deterministic escalation paths for borderline scenarios.
    • Fix it by using rules first for obvious violations and reserving LLM judgment for classification plus explanation.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides