How to Build a compliance checking Agent Using LlamaIndex in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21
compliance-checkingllamaindexpythonpension-funds

A compliance checking agent for pension funds reviews member communications, investment proposals, policy documents, and operational workflows against internal rules and regulatory obligations. It matters because pension funds operate under strict fiduciary duties, data residency constraints, and audit requirements, so a bad answer is not just inaccurate — it can create regulatory exposure.

Architecture

  • Document ingestion layer

    • Pulls in policy PDFs, scheme rules, investment guidelines, and regulator circulars.
    • Uses SimpleDirectoryReader or a custom loader for controlled document sources.
  • Indexing layer

    • Converts source documents into searchable nodes with VectorStoreIndex.
    • Stores embeddings in a backend that fits your residency and retention requirements.
  • Compliance rules layer

    • Encodes fund-specific policies like contribution limits, disclosure requirements, transfer restrictions, and escalation triggers.
    • Keeps hard rules separate from the LLM so you can audit them.
  • Agent orchestration layer

    • Uses ReActAgent or a query engine wrapper to route questions to retrieval plus rule checks.
    • Produces structured answers with citations.
  • Audit and logging layer

    • Captures input, retrieved sources, final decision, and timestamp.
    • Needed for internal audit and regulator review.
  • Guardrails layer

    • Blocks unsupported advice, missing evidence, and out-of-scope requests.
    • Forces “insufficient evidence” responses when the corpus does not support a conclusion.

Implementation

1) Install dependencies and load your compliance corpus

Use local files first. For pension funds, that usually means scheme rules, trustee policies, investment committee minutes, and approved regulatory guidance stored in an approved region.

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
from llama_index.core import SimpleDirectoryReader

docs = SimpleDirectoryReader(
    input_dir="./pension_compliance_docs",
    required_exts=[".pdf", ".md", ".txt"],
).load_data()

print(f"Loaded {len(docs)} documents")

2) Build a retrieval index with citations

This is the core pattern. The agent should not answer from memory; it should retrieve from your approved corpus and cite the source chunks it used.

from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

llm = OpenAI(model="gpt-4o-mini", temperature=0)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

index = VectorStoreIndex.from_documents(
    docs,
    embed_model=embed_model,
)

query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=5,
    response_mode="compact",
)

response = query_engine.query(
    "Does this proposed member communication comply with disclosure rules for early retirement transfers?"
)

print(response)
for source in response.source_nodes:
    print(source.node.get_content()[:300])

That gives you retrieval-backed answers with traceability. For pension funds, this is the minimum bar: every claim should map back to a policy or regulation snippet.

3) Add deterministic compliance checks before the LLM answers

Do not let the model decide everything. Use explicit rule checks for known pension fund constraints such as missing disclosures, prohibited advice language, or transfer timing violations.

from dataclasses import dataclass
from typing import List

@dataclass
class ComplianceResult:
    passed: bool
    issues: List[str]

def check_basic_pension_rules(text: str) -> ComplianceResult:
    issues = []

    if "guaranteed return" in text.lower():
        issues.append("Potentially misleading performance language.")

    if "financial advice" in text.lower() and "not advice" not in text.lower():
        issues.append("Advice disclaimer missing.")

    if "transfer" in text.lower() and "cooling-off" not in text.lower():
        issues.append("Transfer communication may be missing cooling-off disclosure.")

    return ComplianceResult(passed=len(issues) == 0, issues=issues)

draft = """
You should consider transferring your pension immediately.
This is guaranteed to improve your retirement outcome.
"""

result = check_basic_pension_rules(draft)
print(result.passed)
print(result.issues)

Use these checks as a gate. If they fail, return a blocked response or route to human review before the LLM generates any final wording.

4) Wrap retrieval into an agent workflow

For more than simple Q&A, use an agent that can inspect retrieved evidence and then produce a decision summary. In practice, I keep this narrow: one tool for retrieval, one tool for rule checks.

from llama_index.core.tools import QueryEngineTool
from llama_index.core.agent import ReActAgent

compliance_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="pension_compliance_search",
    description="Search approved pension fund compliance documents."
)

agent = ReActAgent.from_tools(
    tools=[compliance_tool],
    llm=llm,
    verbose=True,
)

answer = agent.chat(
    "Review this member transfer notice for compliance risks and cite the supporting policy language."
)

print(answer)

This pattern works well when you need an analyst-style interface. The key is still the same: retrieval first, reasoning second, no free-form invention.

Production Considerations

  • Data residency

    • Keep embeddings, vector store, logs, and raw documents in the approved jurisdiction.
    • If your pension fund operates under regional data rules, do not send member data to unmanaged external services without legal approval.
  • Auditability

    • Store every request with retrieved source IDs, final output, model version, prompt version, and timestamp.
    • Regulators will care about why a decision was made as much as the decision itself.
  • Human escalation

    • Route high-risk cases to compliance officers: benefit transfers, hardship withdrawals, complaints handling, trustee decisions.
    • The agent should flag risk; it should not replace sign-off on regulated actions.
  • Monitoring

    • Track citation coverage rate, blocked-response rate, false negative rate on known test cases, and drift in retrieval quality.
    • Re-run golden compliance scenarios after every prompt or index change.

Common Pitfalls

  1. Letting the model answer without evidence

    • Fix it by requiring response.source_nodes and rejecting answers with weak or missing citations.
    • For pension funds this is non-negotiable because unsupported guidance can become a compliance incident.
  2. Mixing policy interpretation with factual extraction

    • Keep rule logic deterministic where possible.
    • Use LlamaIndex for retrieval and summarization; use Python code for hard checks like disclosure presence or prohibited phrases.
  3. Ignoring document versioning

    • Pension schemes change rules often: trustees update communications templates; regulators issue new guidance; employers amend contribution policies.
    • Version your corpus by effective date so the agent does not cite outdated policy language.

A good compliance agent for pension funds is not “smart” in the general sense. It is constrained: grounded in approved documents, explicit about uncertainty, auditable end-to-end, and boring enough to survive scrutiny from compliance teams.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides