How to Build a claims processing Agent Using AutoGen in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
claims-processingautogenpythonhealthcare

A claims processing agent for healthcare takes structured claim data, checks it against policy and billing rules, flags missing or inconsistent fields, drafts denial or approval notes, and routes edge cases to a human reviewer. It matters because claims are high-volume, rules-heavy, and expensive to process manually; if you get the workflow wrong, you create payment delays, compliance risk, and avoidable rework.

Architecture

  • Claim intake service

    • Accepts claim payloads from your EHR, clearinghouse, or internal queue.
    • Normalizes fields like member ID, CPT/HCPCS codes, diagnosis codes, dates of service, and provider identifiers.
  • Policy/rules retrieval layer

    • Pulls payer-specific policy text, benefit limits, prior auth requirements, and coding guidance.
    • Keep this separate from the model so you can update rules without changing prompts.
  • AutoGen agent team

    • A claims analyst agent reviews the claim.
    • A compliance agent checks HIPAA-safe handling, auditability, and policy constraints.
    • A supervisor/user proxy orchestrates the conversation and decides whether to escalate.
  • Decision and audit store

    • Persists the final outcome: approve, deny, pend for review.
    • Stores the exact reasoning inputs used at decision time for audit trails.
  • Human review queue

    • Handles low-confidence cases.
    • Required for medical necessity disputes, ambiguous coding, and policy exceptions.

Implementation

1) Install AutoGen and define your claim schema

Use pyautogen with a strict schema so the agent works on structured inputs instead of free-form text. For healthcare claims, that means no raw PHI dumps into prompts unless you have explicit controls around access and retention.

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class Claim:
    claim_id: str
    member_id: str
    provider_npi: str
    cpt_codes: List[str]
    icd10_codes: List[str]
    date_of_service: str
    amount: float
    prior_auth_number: Optional[str] = None
    notes: Optional[str] = None

2) Create the agents with AutoGen’s actual API

This pattern uses AssistantAgent for analysis and compliance review, plus UserProxyAgent to kick off the workflow. The key is that the agents do not directly “decide” on their own; they produce structured recommendations that your application validates.

import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

claims_analyst = AssistantAgent(
    name="claims_analyst",
    llm_config=llm_config,
    system_message=(
        "You review healthcare claims for completeness and policy alignment. "
        "Return concise findings with approve/deny/pend reasons. "
        "Do not invent missing clinical facts."
    ),
)

compliance_agent = AssistantAgent(
    name="compliance_agent",
    llm_config=llm_config,
    system_message=(
        "You check for HIPAA-safe handling, auditability, and escalation triggers. "
        "Flag missing authorization data, ambiguous medical necessity language, "
        "and any request requiring human review."
    ),
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
)

3) Run a controlled multi-agent review loop

Here’s the actual orchestration pattern. The UserProxyAgent sends the claim summary to the analyst first. Then the compliance agent reviews the analyst output before you persist a final decision in your app layer.

claim_text = """
Claim ID: CLM-10021
Member ID: M-88421
Provider NPI: 1234567890
CPT Codes: 99213, 80053
ICD-10 Codes: E11.9
Date of Service: 2026-04-02
Amount: 184.50
Prior Auth Number: None
Notes: Follow-up visit for diabetes management.
"""

analyst_reply = user_proxy.initiate_chat(
    claims_analyst,
    message=(
        "Review this healthcare claim and return JSON with keys "
        "'decision', 'reason', 'missing_data', 'escalate'.\n\n"
        f"{claim_text}"
    ),
)

compliance_reply = user_proxy.initiate_chat(
    compliance_agent,
    message=(
        "Review the previous claim analysis for compliance risks. "
        "Return JSON with keys 'risk_level', 'audit_notes', 'human_review_required'.\n\n"
        f"{analyst_reply.chat_history[-1]['content']}"
    ),
)

print(analyst_reply.summary)
print(compliance_reply.summary)

4) Add deterministic post-processing before any downstream action

Do not let an LLM directly trigger payment or denial. Parse its output, validate against business rules, then route to automation or human review.

import json

def finalize_decision(analyst_json: str, compliance_json: str):
    analyst = json.loads(analyst_json)
    compliance = json.loads(compliance_json)

    if compliance["human_review_required"]:
        return {"status": "pend", "reason": compliance["audit_notes"]}

    if analyst["decision"] == "approve" and compliance["risk_level"] == "low":
        return {"status": "approve", "reason": analyst["reason"]}

    return {"status": "pend", "reason": analyst["reason"]}

# Example:
# final = finalize_decision(analyst_reply.summary, compliance_reply.summary)

Production Considerations

  • Keep PHI out of prompts where possible

    • Tokenize member identifiers.
    • Redact free-text notes before sending them to the model.
    • If PHI must be processed, enforce access control and retention policies aligned with HIPAA.
  • Log every decision path

    • Store prompt version, model version, retrieved policy documents, agent outputs, and final action.
    • This is non-negotiable for audits and appeals.
  • Pin data residency early

    • Healthcare customers often require region-bound processing.
    • Make sure your model endpoint, vector store, logs, backups, and observability stack stay in approved regions.
  • Add hard guardrails around automation

    • Auto-approve only when confidence is high and rules are deterministic.
    • Anything involving medical necessity ambiguity or prior auth gaps should go to human review.

Common Pitfalls

  1. Letting the model decide payment outcomes directly

    • Avoid this by using LLM output as advisory only.
    • Your application should enforce rule checks before approving or denying anything.
  2. Sending raw clinical notes into prompts

    • This creates unnecessary privacy exposure.
    • Redact or summarize notes first; keep original PHI in your secure system of record.
  3. Skipping audit metadata

    • If you cannot reconstruct why a claim was pended or denied, you will fail operational reviews.
    • Persist prompt templates, retrieved policies, timestamps, model IDs, and final action codes.
  4. Treating all denials as final

    • In healthcare claims workflows there are appeal paths.
    • Design your agent to produce denial rationale plus next-step evidence requirements so staff can resolve issues quickly.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides