How to Build a transaction monitoring Agent Using AutoGen in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringautogenpythonhealthcare

A transaction monitoring agent for healthcare watches claims, payments, refunds, eligibility checks, and provider billing events for suspicious patterns. It matters because fraud, waste, and abuse directly hit patient trust, payer margins, and regulatory exposure, and you need a system that can triage anomalies without leaking protected health information.

Architecture

  • Event ingest layer

    • Pulls transactions from claims queues, payment rails, EHR-integrated billing feeds, or Kafka topics.
    • Normalizes each event into a strict schema before the agent sees it.
  • Policy and compliance layer

    • Enforces HIPAA minimum-necessary access.
    • Redacts PHI fields not required for the investigation.
    • Applies data residency rules before any model call.
  • AutoGen agent team

    • A monitoring agent classifies risk signals.
    • A review agent explains why a transaction is suspicious.
    • A compliance agent checks whether escalation is allowed under policy.
  • Rules engine

    • Handles deterministic checks like duplicate claim IDs, impossible service dates, out-of-network spikes, or refund loops.
    • Keeps obvious cases out of the LLM path.
  • Case store and audit log

    • Persists every decision, prompt summary, model output, and reviewer action.
    • Needed for audits, incident review, and model governance.

Implementation

1) Define the transaction schema and redact PHI early

Keep the payload small. The agent should see billing metadata, not raw clinical notes unless there is a documented reason.

from dataclasses import dataclass
from typing import Optional

@dataclass
class Transaction:
    transaction_id: str
    patient_id: str
    provider_id: str
    amount: float
    currency: str
    transaction_type: str   # claim | refund | eligibility | payment
    service_date: str
    submission_date: str
    location_state: str
    diagnosis_code: Optional[str] = None
    procedure_code: Optional[str] = None
    notes: Optional[str] = None

def redact_transaction(tx: Transaction) -> dict:
    return {
        "transaction_id": tx.transaction_id,
        "provider_id": tx.provider_id,
        "amount": tx.amount,
        "currency": tx.currency,
        "transaction_type": tx.transaction_type,
        "service_date": tx.service_date,
        "submission_date": tx.submission_date,
        "location_state": tx.location_state,
        "diagnosis_code": tx.diagnosis_code,
        "procedure_code": tx.procedure_code,
        # Do not pass patient_id or notes unless policy allows it
    }

2) Build an AutoGen assistant that scores risk and explains why

Use AssistantAgent for analysis and UserProxyAgent to drive the workflow. This pattern works well when your backend service is the “user” that submits one transaction at a time.

import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "config_list": [
        {
            "model": os.environ["OPENAI_MODEL"],
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

monitor_agent = AssistantAgent(
    name="monitor_agent",
    llm_config=llm_config,
    system_message=(
        "You review healthcare transactions for fraud/waste/abuse risk. "
        "Use only the provided fields. Do not request PHI. "
        "Return JSON with risk_level, reasons, and recommended_action."
    ),
)

runner = UserProxyAgent(
    name="runner",
    human_input_mode="NEVER",
)

def analyze_transaction(tx_payload: dict):
    prompt = f"""
Assess this healthcare transaction:

{tx_payload}

Return JSON with:
- risk_level: low | medium | high
- reasons: list of short strings
- recommended_action: approve | queue_for_review | escalate_compliance
"""
    result = runner.initiate_chat(
        monitor_agent,
        message=prompt,
        max_turns=1,
    )
    return result.chat_history[-1]["content"]

if __name__ == "__main__":
    sample = {
        "transaction_id": "txn_10021",
        "provider_id": "prov_77",
        "amount": 18950.00,
        "currency": "USD",
        "transaction_type": "claim",
        "service_date": "2026-04-10",
        "submission_date": "2026-04-20",
        "location_state": "CA",
        "diagnosis_code": "I10",
        "procedure_code": "99285",
    }
    print(analyze_transaction(sample))

3) Add a compliance checker as a second agent in the same chat flow

This is where AutoGen becomes useful beyond single-agent prompting. The first agent flags risk; the second validates whether escalation or retention rules are satisfied.

from autogen import ConversableAgent

compliance_agent = ConversableAgent(
    name="compliance_agent",
    llm_config=llm_config,
)

def compliance_review(risk_result: str):
    message = f"""
Review this monitoring output for healthcare compliance:

{risk_result}

Check:
1. Whether the recommendation respects minimum-necessary access.
2. Whether PHI exposure is avoided.
3. Whether escalation to human compliance review is warranted.
Return JSON only.
"""
    return runner.initiate_chat(
        compliance_agent,
        message=message,
        max_turns=1,
    ).chat_history[-1]["content"]

4) Route deterministic cases before the LLM

Do not spend tokens on obvious duplicates or policy violations. Put those in code first.

def rule_based_flags(tx: dict) -> list[str]:
    flags = []
    if tx["amount"] > 10000:
      flags.append("high_amount")
    if tx["service_date"] > tx["submission_date"]:
      flags.append("impossible_date_order")
    if tx.get("procedure_code") == "99285" and tx["location_state"] not in {"CA", "NY", "TX"}:
      flags.append("outlier_service_pattern")
    return flags

def monitor(tx):
    payload = redact_transaction(tx)
    flags = rule_based_flags(payload)

    if flags:
      return {
          "decision": "queue_for_review",
          "rule_flags": flags,
          "agent_output": analyze_transaction(payload),
      }

classify_only = True

Production Considerations

  • Deploy inside your controlled boundary

    • Run AutoGen workers in your VPC or private cluster.
    • Keep logs and vector stores in-region if you have residency requirements.
  • Audit everything

    • Store input hash, redacted payload, model version, prompt template version, and final disposition.
    • Healthcare teams will ask who saw what and why it was escalated.
  • Add hard guardrails

    • Block raw PHI from reaching prompts unless explicitly approved by policy.
    • Enforce allowlists for fields per use case.
    • Reject free-form tool calls that could exfiltrate data.
  • Monitor drift and false positives

    • Track precision on known fraud cases and operational alert volume by provider type.
    • A noisy monitor gets ignored fast by revenue cycle teams.

Common Pitfalls

  • Sending full clinical records to the model

    • Avoid this by redacting at ingest time and passing only billing metadata plus minimal context.
  • Using the LLM as the first line of defense

    • Put rules first for duplicates, impossible dates, threshold breaches, and known fraud signatures.
    • Use AutoGen for reasoning and explanation, not basic filtering.
  • Skipping auditability

    • If you cannot reconstruct why an alert fired, you cannot defend it during compliance review.
    • Persist prompts, outputs, rule hits, reviewer actions, and timestamps.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides