How to Build a fraud detection Agent Using LlamaIndex in Python for insurance

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionllamaindexpythoninsurance

A fraud detection agent for insurance takes a claim, policy, and supporting evidence, then checks for inconsistencies, missing documentation, duplicate submissions, suspicious patterns, and policy violations. The point is not to auto-deny claims; it is to route high-risk cases to human investigators with a clear evidence trail, which matters because insurers need speed without losing compliance, auditability, or fairness.

Architecture

  • Claim intake layer

    • Accepts structured claim data: claimant identity, policy number, loss date, amount, incident type, adjuster notes.
    • Normalizes fields before the agent sees them.
  • Document retrieval layer

    • Pulls evidence from claim forms, police reports, invoices, photos metadata, prior claims history, and policy documents.
    • Uses LlamaIndex VectorStoreIndex plus metadata filters for fast retrieval.
  • Fraud reasoning layer

    • Uses an LLM-backed QueryEngine to compare the claim against policy terms and historical evidence.
    • Produces a risk score and a short rationale grounded in retrieved context.
  • Rules and guardrails layer

    • Enforces hard checks like duplicate claim IDs, policy lapsed dates, excluded perils, and missing mandatory documents.
    • Prevents the model from making unsupported denial decisions.
  • Audit trail layer

    • Stores prompt inputs, retrieved nodes, model output, timestamps, and investigator actions.
    • Required for compliance review and dispute handling.
  • Human review handoff

    • Routes medium/high-risk claims into an investigator queue with citations.
    • Keeps final adjudication with a human when the case is ambiguous or high impact.

Implementation

1) Install dependencies and define the document model

Use LlamaIndex’s core abstractions directly. For insurance workflows you want structured metadata on every document so you can filter by claim ID, jurisdiction, document type, and retention class.

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
from llama_index.core import Document

claim_doc = Document(
    text="""
    Claimant reports water damage in kitchen on 2025-01-14.
    Invoice from contractor totals $18,450.
    Prior claim filed for similar water damage at same address in 2023.
    """,
    metadata={
        "claim_id": "CLM-10482",
        "policy_id": "POL-77821",
        "jurisdiction": "NY",
        "doc_type": "claim_summary",
        "source_system": "claims_core",
    },
)

2) Build the retrieval index over claims and policy evidence

For a real system you would ingest multiple documents per claim: policy wording, incident photos metadata, repair invoices, adjuster notes, and prior claims. VectorStoreIndex.from_documents() gives you a clean baseline that works well for evidence retrieval.

from llama_index.core import VectorStoreIndex

docs = [claim_doc]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=3)

If you are running this against production data, replace the default storage with your approved vector database and ensure residency rules are respected. The pattern stays the same; only the backend changes.

3) Create a fraud scoring prompt that forces grounded answers

You want the model to classify risk using retrieved evidence only. A simple prompt template works better than free-form prompting because it makes review outputs predictable for investigators.

from llama_index.core import PromptTemplate

fraud_prompt = PromptTemplate(
    """You are an insurance fraud triage assistant.
Use only the provided context. Do not invent facts.

Context:
{context_str}

Claim:
{query_str}

Return:
1. Risk level: LOW | MEDIUM | HIGH
2. Fraud indicators observed
3. Missing or inconsistent evidence
4. Recommended next action
5. Short audit-friendly rationale"""
)

Then wire it into a query engine call:

response = query_engine.query(
    fraud_prompt.format(
        context_str="Prior claim history shows similar water damage at same address.",
        query_str="Assess whether this claim should be escalated for fraud review."
    )
)

print(response)

4) Wrap the agent in deterministic pre-checks before LLM reasoning

Do not let the model be your first line of defense. Hard rules should catch obvious issues before any LLM call happens.

from datetime import date

def rule_checks(claim):
    flags = []

    if claim["loss_date"] > date.today():
        flags.append("future_loss_date")

    if claim["amount"] > 10000 and not claim.get("invoice_attached"):
        flags.append("high_value_missing_invoice")

    if claim.get("prior_similar_claim") is True:
        flags.append("possible_duplicate_pattern")

    return flags


claim = {
    "claim_id": "CLM-10482",
    "loss_date": date(2025, 1, 14),
    "amount": 18450,
    "invoice_attached": True,
    "prior_similar_claim": True,
}

flags = rule_checks(claim)

if flags:
    print({"route": "investigator_queue", "flags": flags})
else:
    result = query_engine.query(
        f"Review claim {claim['claim_id']} for fraud risk using available evidence."
    )
    print(result)

This pattern keeps low-level policy violations out of the model path and makes your system easier to defend during audits.

Production Considerations

  • Keep PHI/PII under control

    • Mask sensitive fields before indexing when possible.
    • Restrict access by role and log every retrieval event.
    • For regulated lines of business, make sure your storage region matches residency requirements.
  • Make outputs auditable

    • Persist the exact retrieved nodes returned by QueryEngine.
    • Store model version, prompt version, timestamps, and investigator disposition.
    • If a denial or escalation is challenged later, you need reproducible evidence.
  • Add guardrails around decisioning

    • The agent should recommend escalation or review status, not auto-deny claims.
    • Block unsupported conclusions like “fraud confirmed” unless backed by explicit rules plus human validation.
    • Use confidence thresholds to route uncertain cases to manual review.
  • Monitor drift in claims patterns

    • Fraud tactics change by line of business and geography.
    • Track false positives by jurisdiction and peril type.
    • Recalibrate prompts and retrieval filters when adjuster feedback shows systematic misses.

Common Pitfalls

  • Using only the LLM without hard rules

    • This leads to noisy triage and bad escalations.
    • Fix it by running deterministic checks first: duplicate IDs, date validity, coverage window checks, missing mandatory docs.
  • Indexing raw documents without metadata

    • You lose traceability fast.
    • Fix it by attaching claim_id, policy_id, jurisdiction, doc_type, and retention tags to every Document.
  • Letting the agent make final adjudication decisions

    • That creates compliance risk and weakens oversight.
    • Fix it by limiting the agent to risk scoring plus explanation; final action stays with claims staff or SIU investigators.

If you build this way—rules first, retrieval second, LLM third—you get an insurance fraud triage agent that is useful in production instead of just impressive in a demo.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides