How to Build a compliance checking Agent Using LlamaIndex in Python for insurance

By Cyprian AaronsUpdated 2026-04-21
compliance-checkingllamaindexpythoninsurance

A compliance checking agent for insurance reviews policy documents, underwriting notes, claims correspondence, and customer communications against internal rules and regulatory requirements. The goal is simple: catch risky language, missing disclosures, and process violations before they reach a regulator, a customer, or an audit trail.

Architecture

  • Document ingestion layer

    • Pulls PDFs, DOCX files, emails, and ticket exports into a normalized text format.
    • In insurance, this usually means policy wording, endorsements, claims letters, complaint responses, and underwriting guidelines.
  • Compliance knowledge base

    • Stores regulations, internal SOPs, product rules, and approved clause libraries.
    • Use VectorStoreIndex with metadata like jurisdiction, product line, effective date, and document version.
  • Retrieval layer

    • Uses LlamaIndex retrievers to fetch the exact policy clauses or regulatory passages relevant to the user’s document.
    • This is what keeps the agent grounded instead of guessing.
  • LLM reasoning layer

    • Takes the retrieved evidence and produces a structured compliance assessment.
    • The output should include pass/fail status, cited sources, and remediation steps.
  • Audit logging layer

    • Persists input hashes, retrieved chunks, model outputs, timestamps, and reviewer actions.
    • Insurance teams need traceability for regulators and internal audit.
  • Human review workflow

    • Routes borderline cases to compliance officers.
    • Do not auto-approve anything that affects customer rights or claim outcomes without review.

Implementation

1) Load your compliance corpus into LlamaIndex

Start with your source documents. For insurance use cases, keep regulations separate from internal policy docs so you can filter by jurisdiction and product line.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

# Load documents from a local directory
documents = SimpleDirectoryReader(
    input_dir="./compliance_docs",
    recursive=True
).load_data()

# Split into smaller chunks for retrieval
parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
nodes = parser.get_nodes_from_documents(documents)

# Build the index
index = VectorStoreIndex(nodes)

This gives you a searchable knowledge base over your compliance content. In practice, your directory might contain:

  • FCA conduct rules
  • state-specific insurance bulletins
  • claims handling procedures
  • approved disclaimer templates

2) Create a retriever that filters by insurance context

You do not want a general answer from all documents. You want the exact rule set for the line of business and jurisdiction being checked.

from llama_index.core.vector_stores.types import MetadataFilters, ExactMatchFilter

retriever = index.as_retriever(
    similarity_top_k=5,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="jurisdiction", value="UK"),
        ExactMatchFilter(key="line_of_business", value="motor")
    ])
)

query = "Check whether this claims email includes all required complaint disclosure language."
nodes = retriever.retrieve(query)

for node in nodes:
    print(node.score)
    print(node.node.text[:400])
    print("---")

If you are ingesting metadata correctly at load time, this becomes a clean way to scope results to one market or product. That matters because insurance compliance is rarely one-size-fits-all.

3) Wrap retrieval in an agent workflow with structured output

For production checks, you want more than free-form prose. Use a response synthesizer or query engine pattern that forces the model to return a decision plus evidence.

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

response_synthesizer = get_response_synthesizer(response_mode="compact")

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

prompt = """
Review this insurance claims email for compliance issues.
Return:
1. status: PASS or FAIL
2. issues: bullet list of violations or missing items
3. evidence: cite the relevant policy/regulatory points from retrieved context
4. remediation: exact fix needed before sending

Email:
We have reviewed your claim and will get back to you soon.
"""

response = query_engine.query(prompt)
print(str(response))

The key here is deterministic settings and retrieval grounding. temperature=0 reduces variance when the same email gets checked twice.

4) Add an audit trail around every decision

In insurance workflows, every check needs an explanation that survives review months later. Store the original input hash, retrieved node IDs, model response, and reviewer override if one happens.

import hashlib
import json
from datetime import datetime

def hash_text(text: str) -> str:
    return hashlib.sha256(text.encode("utf-8")).hexdigest()

def log_compliance_check(input_text: str, response_text: str):
    record = {
        "timestamp_utc": datetime.utcnow().isoformat(),
        "input_hash": hash_text(input_text),
        "response": response_text,
        "model": "gpt-4o-mini",
        "workflow": "insurance_compliance_check"
    }

    with open("audit_log.jsonl", "a", encoding="utf-8") as f:
        f.write(json.dumps(record) + "\n")

log_compliance_check(prompt, str(response))

If you need stronger controls later, replace the flat file with immutable storage or your SIEM pipeline.

Production Considerations

  • Keep data residency in scope

    • Insurance data often cannot leave specific regions.
    • Make sure your vector store and LLM endpoint are deployed in-region if you handle customer PII or regulated claims data.
  • Log retrieval evidence separately from raw customer content

    • Store node IDs, document versions, and hashes.
    • Limit raw text retention unless legal/compliance explicitly requires it.
  • Add hard guardrails for high-risk outputs

    • If the agent detects missing mandatory disclosures or potential unfair treatment language, force human review.
    • Never let it auto-send customer communications on its own.
  • Monitor drift in regulations and templates

    • Regulations change by market and line of business.
    • Re-index when policy wording changes; otherwise the agent will validate against stale rules.

Common Pitfalls

MistakeWhy it breaksHow to avoid it
Using one global index for every jurisdictionUK motor rules will bleed into US life insurance checksPartition indexes by region/product or use strict metadata filters
Letting the model answer without citationsYou cannot defend an unsupported compliance decision in auditAlways return retrieved evidence IDs or quoted passages
Indexing stale policy versionsThe agent approves old wording that no longer matches current rulesVersion documents and re-index on every approved policy change

The other failure mode is treating this like a chatbot instead of a control point. In insurance operations, this should behave like a deterministic reviewer with traceable evidence first and natural-language output second.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides