How to Build a fraud detection Agent Using AutoGen in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionautogenpythonhealthcare

A healthcare fraud detection agent watches claims, billing notes, and provider activity for patterns that look inconsistent with clinical reality or payer policy. It matters because false claims, upcoding, duplicate billing, and identity misuse drain margin, trigger audits, and create compliance exposure that can spill into legal and patient-care issues.

Architecture

  • Data ingestion layer

    • Pulls structured claim data, provider metadata, and case notes from approved internal systems.
    • Enforces PHI minimization before anything reaches the agent.
  • Fraud analysis agent

    • Uses AutoGen AssistantAgent to inspect claim patterns and generate a risk assessment.
    • Produces structured findings: risk score, rationale, evidence, and recommended action.
  • Rules + policy layer

    • Encodes deterministic checks for known fraud indicators.
    • Blocks obviously non-compliant actions before the LLM sees sensitive context.
  • Human review agent

    • Uses AutoGen UserProxyAgent to route borderline cases to a compliance analyst.
    • Ensures final disposition is human-approved for high-risk claims.
  • Audit logging layer

    • Persists prompts, tool calls, outputs, timestamps, and reviewer decisions.
    • Supports HIPAA auditability and internal investigations.
  • Secure tool layer

    • Exposes only narrow tools for querying claims or flagging cases.
    • Keeps the agent from free-form access to production systems.

Implementation

1) Install AutoGen and define a strict message schema

Use the current AutoGen package and keep the agent output structured. In healthcare, you want deterministic fields you can store in an audit trail and pass to downstream case management.

from pydantic import BaseModel, Field
from typing import List

class FraudFinding(BaseModel):
    claim_id: str
    risk_score: int = Field(ge=0, le=100)
    fraud_type: str
    rationale: str
    evidence: List[str]
    next_action: str

# Example claim payload after PHI minimization
claim = {
    "claim_id": "CLM-88421",
    "provider_npi": "1234567890",
    "cpt_codes": ["99213", "99213", "93000"],
    "date_of_service": "2025-03-18",
    "units": [1, 1, 3],
    "diagnosis_codes": ["E11.9"],
    "amount": 1840.00,
}

2) Create the agents with AutoGen

This pattern uses an AssistantAgent for analysis and a UserProxyAgent for human review. The key point is that the assistant never gets direct system access; it only reasons over sanitized inputs.

import os
import autogen

llm_config = {
    "config_list": [
        {
            "model": os.environ["OPENAI_MODEL"],
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

fraud_analyst = autogen.AssistantAgent(
    name="fraud_analyst",
    llm_config=llm_config,
    system_message=(
        "You are a healthcare fraud detection analyst. "
        "Assess claims for potential fraud, waste, or abuse. "
        "Return concise findings with risk_score, fraud_type, rationale, evidence, next_action. "
        "Do not invent facts. Do not request unnecessary PHI."
    ),
)

compliance_reviewer = autogen.UserProxyAgent(
    name="compliance_reviewer",
    human_input_mode="ALWAYS",
    max_consecutive_auto_reply=0,
)

3) Run a controlled analysis conversation

The prompt should contain only the minimum claim data needed to assess risk. For healthcare workloads, I prefer a short task prompt plus explicit instructions about what constitutes suspicious behavior.

task = f"""
Review this healthcare claim for fraud indicators.

Claim:
{claim}

Check for:
- duplicate billing patterns
- impossible code combinations
- suspicious unit inflation
- mismatch between diagnosis and procedure intensity
- unusually high reimbursement relative to peer patterns

Return a JSON-like summary with:
claim_id, risk_score (0-100), fraud_type,
rationale, evidence (list), next_action.
"""

result = compliance_reviewer.initiate_chat(
    fraud_analyst,
    message=task,
)

print(result.summary if hasattr(result, "summary") else result.chat_history[-1]["content"])

4) Wrap the output in validation and escalation logic

Do not trust raw model text in production. Parse it into a schema, validate it, then route high-risk cases to humans or downstream case management.

import json

def assess_claim(raw_output: str) -> FraudFinding:
    data = json.loads(raw_output)
    finding = FraudFinding(**data)

    if finding.risk_score >= 80:
        print(f"Escalate {finding.claim_id} to SIU/compliance")
    elif finding.risk_score >= 50:
        print(f"Queue {finding.claim_id} for analyst review")
    else:
        print(f"Low risk: {finding.claim_id}")

    return finding

Production Considerations

  • Data residency

    • Keep PHI inside approved regions and approved vendors only.
    • If your model endpoint crosses borders, you need a hard stop before transmission.
  • Auditability

    • Log prompt version, sanitized input hash, model response, reviewer decision, and timestamps.
    • Store immutable records so compliance can reconstruct why a claim was flagged.
  • Guardrails

    • Redact names, member IDs, addresses, and free-text notes unless they are strictly required.
    • Add policy checks before LLM calls so obviously invalid or unsafe requests never leave your boundary.
  • Monitoring

    • Track false positives by provider specialty and claim type.
    • Watch drift in risk scores after coding changes or payer policy updates.

Common Pitfalls

  1. Sending raw PHI to the model

    • Avoid this by preprocessing claims into minimal features.
    • Keep member identifiers out of prompts unless there is an explicit legal basis and approved environment.
  2. Using free-form outputs in downstream systems

    • Don’t pipe assistant text directly into case queues or denial workflows.
    • Validate against a schema like FraudFinding first.
  3. Treating the agent as the final decision-maker

    • Fraud detection in healthcare needs human review for material decisions.
    • Use the agent to prioritize work; let compliance analysts make the call on escalations.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides