How to Build a fraud detection Agent Using AutoGen in Python for insurance

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionautogenpythoninsurance

A fraud detection agent for insurance reviews claims, policy data, and supporting documents to flag suspicious cases before they get paid out. It matters because the cost of a false negative is a bad payout, while the cost of a false positive is wasted adjuster time and frustrated customers.

Architecture

  • Claim intake service

    • Pulls claim payloads from your claims system, queue, or API.
    • Normalizes fields like claimant identity, loss date, amount, location, and document references.
  • Evidence retrieval layer

    • Fetches policy terms, prior claims history, adjuster notes, and uploaded documents.
    • Keeps the agent grounded in source data instead of guessing.
  • AutoGen agent team

    • One agent summarizes the case.
    • One agent checks for fraud signals.
    • One agent acts as a compliance reviewer for explainability and audit language.
  • Rules and scoring engine

    • Applies deterministic checks such as duplicate claims, impossible timelines, or high-risk geographies.
    • Produces a risk score that can be consumed by downstream workflows.
  • Case management output

    • Writes the result back to the fraud queue or SIU system.
    • Stores an explanation trail for auditors and adjusters.

Implementation

1) Install AutoGen and define your insurance case schema

Use a strict schema. Insurance workflows break when the agent receives half-baked JSON or free-form text with missing claim identifiers.

from dataclasses import dataclass
from typing import List, Dict, Any

@dataclass
class ClaimCase:
    claim_id: str
    policy_id: str
    claimant_name: str
    loss_type: str
    loss_date: str
    claim_amount: float
    jurisdiction: str
    prior_claims: List[Dict[str, Any]]
    documents: List[str]

2) Create specialized AutoGen agents

AutoGen’s AssistantAgent works well here because you can separate responsibilities instead of asking one model to do everything. Use one agent for fraud analysis and one for compliance review.

import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
}

fraud_analyst = AssistantAgent(
    name="fraud_analyst",
    llm_config=llm_config,
    system_message=(
        "You are an insurance fraud analyst. "
        "Identify suspicious patterns in claims using only provided facts. "
        "Return concise bullet points with a risk score from 0 to 100."
    ),
)

compliance_reviewer = AssistantAgent(
    name="compliance_reviewer",
    llm_config=llm_config,
    system_message=(
        "You are a compliance reviewer for insurance operations. "
        "Check that conclusions are explainable, auditable, and do not use protected attributes. "
        "Flag any missing evidence or regulatory concerns."
    ),
)

user_proxy = UserProxyAgent(
    name="case_runner",
    human_input_mode="NEVER",
)

3) Run the case through AutoGen and collect an auditable result

This pattern uses initiate_chat() to drive the conversation. Keep the input structured so every run is reproducible and easy to store in your audit log.

import json

case = ClaimCase(
    claim_id="CLM-10422",
    policy_id="POL-7781",
    claimant_name="Jordan Lee",
    loss_type="water_damage",
    loss_date="2026-03-12",
    claim_amount=18450.00,
    jurisdiction="NY",
    prior_claims=[
        {"claim_id": "CLM-9911", "loss_type": "water_damage", "date": "2025-11-02", "amount": 17200}
    ],
    documents=[
        "Adjuster note: same vendor invoice appears in prior claim.",
        "Photo metadata indicates images were taken before reported loss date.",
        "Repair estimate exceeds policy sublimit."
    ],
)

prompt = f"""
Review this insurance claim for fraud risk.

Claim JSON:
{json.dumps(case.__dict__, indent=2)}

Output format:
1. Risk score (0-100)
2. Fraud indicators
3. Evidence gaps
4. Recommended next action
"""

analysis = user_proxy.initiate_chat(
    fraud_analyst,
    message=prompt,
)

review_prompt = f"""
Review the fraud analysis below for compliance and auditability.

Analysis:
{analysis.summary}

Check:
- no protected-class inference
- no unsupported accusations
- clear evidence references
- suitable for SIU triage record
"""

review = user_proxy.initiate_chat(
    compliance_reviewer,
    message=review_prompt,
)

print("FRAUD ANALYSIS:")
print(analysis.summary)
print("\nCOMPLIANCE REVIEW:")
print(review.summary)

4) Add deterministic guardrails before the model runs

For insurance, don’t let the LLM be your first line of defense. Use rules to catch obvious issues like duplicate claims or impossible dates before sending anything into AutoGen.

from datetime import datetime

def basic_fraud_rules(case: ClaimCase) -> list[str]:
    flags = []

    if len(case.prior_claims) >= 1:
        flags.append("Prior claims exist; check repeat-loss pattern.")

    try:
        loss_date = datetime.strptime(case.loss_date, "%Y-%m-%d")
        if loss_date > datetime.utcnow():
            flags.append("Loss date is in the future.")
    except ValueError:
        flags.append("Invalid loss date format.")

    if case.claim_amount > 15000:
        flags.append("High-value claim; prioritize manual review.")

    return flags

rule_flags = basic_fraud_rules(case)
print(rule_flags)

Production Considerations

  • Data residency

    • Keep claim payloads in-region if your insurer operates under local storage requirements.
    • If you use hosted LLM APIs, verify where prompts and logs are processed and retained.
  • Auditability

    • Store input JSON, model output, rule flags, timestamps, and model version per decision.
    • Claims teams need a traceable reason why a case was escalated.
  • Compliance guardrails

    • Block protected attributes from prompts unless there is a lawful basis and explicit policy approval.
    • Add prompt filters so adjuster notes containing sensitive personal data are redacted before inference.
  • Monitoring

    • Track false positive rate by line of business: auto, property, health-related supplemental coverages.
    • Watch for drift after new fraud patterns emerge or after policy wording changes.

Common Pitfalls

  1. Letting the LLM make the final decision

    • The model should triage and explain, not approve or deny claims.
    • Keep final disposition with rules plus human review.
  2. Passing raw PII into prompts

    • Don’t dump full customer records into chat history.
    • Redact SSNs, bank details, medical notes, and unnecessary identifiers before calling initiate_chat().
  3. Skipping deterministic checks

    • If you rely only on model reasoning, you’ll miss obvious fraud signals like duplicate invoices or future-dated losses.
    • Run rule-based validation first, then use AutoGen for contextual analysis.
  4. No separation between fraud and compliance reasoning

    • One agent can surface suspicion while another checks whether that suspicion is defensible.
    • That separation matters when auditors ask how a referral was generated.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides