How to Build a compliance checking Agent Using AutoGen in Python for insurance

By Cyprian AaronsUpdated 2026-04-21
compliance-checkingautogenpythoninsurance

A compliance checking agent for insurance reviews policy wording, claim notes, customer communications, and underwriting outputs against internal rules and regulatory constraints. It matters because a bad recommendation, missing disclosure, or unsupported decision can create regulatory exposure, customer harm, and expensive remediation.

Architecture

  • Input normalizer

    • Takes raw text from policy documents, emails, claim summaries, or underwriting notes.
    • Extracts the exact artifact the agent should review.
  • Compliance rule set

    • Encodes insurer-specific policies like disclosure requirements, prohibited language, retention rules, and jurisdiction-specific constraints.
    • Keep this separate from the model prompt so legal/compliance can update it without code changes.
  • AutoGen agent layer

    • Uses a primary AssistantAgent to inspect the content.
    • Optionally adds a second reviewer agent for escalation on ambiguous cases.
  • Audit logger

    • Stores the input, model output, rule version, timestamp, and reviewer decision.
    • This is non-negotiable in insurance.
  • Human escalation path

    • Routes uncertain or high-risk findings to a compliance analyst.
    • Prevents the system from making final determinations on borderline cases.

Implementation

1) Install AutoGen and define your compliance rules

Use AutoGen’s current Python package. For a basic single-agent pattern, you need an LLM config plus a strict system prompt that tells the agent to classify issues and cite the rule violated.

from autogen import AssistantAgent
import os
import json

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
    "temperature": 0,
}

COMPLIANCE_SYSTEM_PROMPT = """
You are an insurance compliance checker.
Review the provided text for:
- missing required disclosures
- prohibited promises or guarantees
- unfair or misleading language
- unsupported claim handling statements
Return JSON with keys:
status (PASS|FAIL|ESCALATE),
findings (list of objects with rule, evidence, severity),
summary (short string).
Do not rewrite the source text.
"""

agent = AssistantAgent(
    name="insurance_compliance_agent",
    system_message=COMPLIANCE_SYSTEM_PROMPT,
    llm_config=llm_config,
)

2) Send a document for review and parse the result

This pattern works well when you want deterministic output for downstream workflow engines. Keep temperature at zero and force structured JSON in the prompt.

def check_compliance(text: str) -> dict:
    message = f"""
Review this insurance artifact:

TEXT:
{text}

Return only valid JSON.
"""
    response = agent.generate_reply(messages=[{"role": "user", "content": message}])
    content = response if isinstance(response, str) else response.get("content", "")
    return json.loads(content)

sample_text = """
Your claim is approved and payment will be sent today.
No further documentation is needed.
"""

result = check_compliance(sample_text)
print(result["status"])
print(result["findings"])

3) Add a second agent for escalation on risky cases

In insurance, one agent should not be the final authority on ambiguous decisions. A reviewer agent can challenge borderline findings before you hand off to a human analyst.

from autogen import UserProxyAgent

reviewer = AssistantAgent(
    name="compliance_reviewer",
    system_message="""
You are a senior insurance compliance reviewer.
Challenge weak findings and confirm only if evidence is explicit.
Return JSON with status PASS|FAIL|ESCALATE and rationale.
""",
    llm_config=llm_config,
)

user_proxy = UserProxyAgent(
    name="orchestrator",
    human_input_mode="NEVER",
)

task = """
Assess whether this wording creates an unsupported promise:
'Your claim is approved and payment will be sent today.'
"""

chat_result = user_proxy.initiate_chat(
    recipient=agent,
    message=task,
)

4) Persist audit data before any downstream action

For regulated workflows, log the exact input, model version, output JSON, and final disposition. That gives you traceability during audits and internal investigations.

from datetime import datetime

def audit_record(document_id: str, input_text: str, result: dict) -> dict:
    return {
        "document_id": document_id,
        "timestamp_utc": datetime.utcnow().isoformat(),
        "model": llm_config["model"],
        "status": result.get("status"),
        "findings": result.get("findings", []),
        "summary": result.get("summary", ""),
        "input_hash": hash(input_text),
        "rule_version": "insurance-compliance-v1.3",
    }

audit_log = audit_record("CLAIM-10021", sample_text, result)
print(audit_log)

Production Considerations

  • Deployment

    • Run the agent behind an internal API gateway with authn/authz tied to your claims or underwriting platform.
    • Keep jurisdiction-specific rule bundles separate if you operate across states or countries.
  • Monitoring

    • Track pass/fail/escalate rates by line of business and document type.
    • Watch for drift in escalation volume after policy changes or model updates.
  • Guardrails

    • Block direct auto-approval of claims or coverage decisions from the agent output alone.
    • Require human sign-off for anything involving denials, exclusions, adverse actions, or customer-facing legal language.
  • Data residency

    • Ensure documents stay in approved regions if you handle PII or sensitive claims data.
    • Redact policy numbers, medical details, or financial identifiers before sending text to external model endpoints when required by your controls.

Common Pitfalls

  • Treating the model output as final compliance judgment

    • Avoid this by making the agent produce findings only. Final approval should come from a rules engine plus human review for high-risk cases.
  • Using vague prompts instead of explicit rule language

    • “Check for compliance issues” is too loose.
    • Spell out what counts as a violation: missing disclosures, guaranteed outcomes, misleading timeframes, unfair claims handling statements.
  • Skipping auditability

    • If you cannot reproduce why the agent flagged something three months later, you do not have a production control.
    • Store input text versions, prompt versions, output JSON, reviewer decisions, and timestamps.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides