How to Build a claims processing Agent Using AutoGen in Python for payments

By Cyprian AaronsUpdated 2026-04-21
claims-processingautogenpythonpayments

A claims processing agent for payments takes an incoming claim, checks the request against policy and transaction data, asks for missing evidence, and routes the case to approval or manual review. For payment teams, this matters because claims are expensive when they are slow, inconsistent, or impossible to audit.

Architecture

  • Claim intake layer

    • Receives claim payloads from API, queue, or case management system.
    • Normalizes fields like claimant ID, payment reference, amount, currency, and reason code.
  • Policy retrieval layer

    • Pulls the relevant payment policy, SLA rules, and eligibility criteria.
    • Keeps the agent grounded in current business rules instead of model memory.
  • AutoGen agent group

    • Uses AssistantAgent for reasoning and UserProxyAgent for execution and tool calls.
    • Coordinates document checks, transaction lookups, and decision drafting.
  • Validation tools

    • Verifies payment status, duplicate claims, chargeback history, KYC flags, and settlement state.
    • Exposes deterministic Python functions that the agent can call.
  • Decision and audit layer

    • Produces an approval/reject/manual-review outcome with structured reasons.
    • Stores every tool call and model response for compliance review.

Implementation

1) Install AutoGen and define your claim schema

Use a strict payload shape. Claims processing breaks when teams let free-form text drift into the workflow.

from dataclasses import dataclass
from typing import Optional

@dataclass
class Claim:
    claim_id: str
    customer_id: str
    payment_reference: str
    amount: float
    currency: str
    reason_code: str
    evidence_url: Optional[str] = None

2) Implement deterministic payment checks

Keep business-critical checks outside the model. The LLM should reason over results, not invent them.

def check_payment_status(payment_reference: str) -> dict:
    # Replace with real DB/API lookup
    mock_db = {
        "pay_1001": {"status": "settled", "duplicate_claim": False, "kyc_flag": False},
        "pay_1002": {"status": "failed", "duplicate_claim": True, "kyc_flag": False},
        "pay_1003": {"status": "settled", "duplicate_claim": False, "kyc_flag": True},
    }
    return mock_db.get(payment_reference, {"status": "unknown", "duplicate_claim": False, "kyc_flag": False})

def validate_claim(claim: Claim) -> dict:
    payment = check_payment_status(claim.payment_reference)

    if claim.amount <= 0:
        return {"eligible": False, "decision": "reject", "reason": "invalid_amount"}

    if payment["status"] != "settled":
        return {"eligible": False, "decision": "manual_review", "reason": f"payment_status_{payment['status']}"}

    if payment["duplicate_claim"]:
        return {"eligible": False, "decision": "reject", "reason": "duplicate_claim"}

    if payment["kyc_flag"]:
        return {"eligible": False, "decision": "manual_review", "reason": "kyc_risk_flag"}

    return {"eligible": True, "decision": "approve", "reason": "policy_passed"}

3) Wire up AutoGen agents and a tool-calling workflow

This pattern uses AssistantAgent to decide what to do with the validation output. UserProxyAgent acts as the orchestrator and can execute your Python function when the assistant requests it.

import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "model": os.environ["OPENAI_MODEL"],
    "api_key": os.environ["OPENAI_API_KEY"],
}

assistant = AssistantAgent(
    name="claims_assistant",
    llm_config=llm_config,
    system_message=(
        "You process payment claims. "
        "Use only provided tool results. "
        "Return a concise decision with audit-ready reasons."
    ),
)

user_proxy = UserProxyAgent(
    name="claims_orchestrator",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
)

def run_claim(claim_dict: dict):
    claim = Claim(**claim_dict)
    result = validate_claim(claim)

    prompt = f"""
Claim:
{claim}

Validation result:
{result}

Task:
Draft a final decision for operations.
Return JSON with keys: decision, reason_code, next_action.
"""

    user_proxy.initiate_chat(
        assistant,
        message=prompt,
        clear_history=True,
    )

sample_claim = {
    "claim_id": "clm_9001",
    "customer_id": "cus_42",
    "payment_reference": "pay_1001",
    "amount": 125.50,
    "currency": "USD",
    "reason_code": "merchant_dispute",
}

run_claim(sample_claim)

4) Add structured output handling for downstream systems

Do not pass free-text decisions directly into payments workflows. Convert the final response into a strict record before writing to your case system.

def build_audit_record(claim: Claim, validation: dict) -> dict:
    return {
        "claim_id": claim.claim_id,
        "customer_id": claim.customer_id,
        # Store only what you need; redact PII where possible.
        "_payment_reference_last4": claim.payment_reference[-4:],
        "decision": validation["decision"],
        "reason_code": validation["reason"],
        # Add timestamps and operator metadata in real deployments.
    }

Production Considerations

  • Compliance

    • Log every decision path with timestamps, model version, prompt version, and tool outputs.
    • Keep approval thresholds explainable so auditors can trace why a claim was approved or escalated.
  • Data residency

    • Route claims data through region-specific infrastructure.
    • Avoid sending full PANs, bank account numbers, or unmasked identifiers to the model; tokenize before inference.
  • Guardrails

    • Hard-code rejection rules for duplicate claims, invalid amounts, sanctions hits, and failed settlement states.
    • Use the model only for classification support and explanation generation.
  • Monitoring

    • Track approval rate by reason code, manual-review rate, false positive rate on fraud flags, and average resolution time.
    • Alert on drift when one merchant segment suddenly spikes in manual reviews.

Common Pitfalls

  • Letting the LLM make final financial decisions

    • Fix this by keeping deterministic policy checks in Python and using AutoGen only for orchestration and narrative output.
  • Passing raw sensitive data into prompts

    • Fix this by redacting customer identifiers, masking account data, and using tokenized references in messages.
  • Skipping audit trails

    • Fix this by persisting prompt inputs, tool outputs from validate_claim, final responses from AssistantAgent, and any human overrides.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides