How to Build a underwriting Agent Using AutoGen in Python for healthcare
An underwriting agent for healthcare takes in member or provider submissions, checks them against policy rules, summarizes risk signals, and drafts a recommendation for human review. In practice, that means faster eligibility decisions, more consistent policy enforcement, and a cleaner audit trail for compliance teams.
Architecture
- •Input normalizer
- •Converts intake data from JSON, PDFs, or form submissions into a structured underwriting request.
- •Policy retrieval layer
- •Pulls plan rules, exclusions, prior authorization logic, and regional compliance constraints from an internal knowledge base.
- •Underwriting reasoning agent
- •Uses AutoGen to analyze the case, compare it to policy rules, and produce a recommendation with cited evidence.
- •Compliance reviewer
- •Checks the draft output for PHI leakage, missing rationale, and prohibited language before anything is returned.
- •Audit logger
- •Persists prompts, tool calls, model outputs, and final decisions for traceability.
- •Human approval gate
- •Ensures a licensed reviewer signs off on adverse decisions or edge cases.
Implementation
1) Set up the AutoGen agents
For this pattern, use one assistant for analysis and one user proxy for orchestration. The assistant does the underwriting work; the proxy handles tool execution and conversation flow.
from autogen import AssistantAgent, UserProxyAgent
config_list = [
{
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
}
]
underwriter = AssistantAgent(
name="underwriter",
llm_config={"config_list": config_list},
system_message=(
"You are a healthcare underwriting analyst. "
"Use only provided policy data. "
"Do not invent medical facts. "
"Return a concise recommendation with rationale and risk flags."
),
)
proxy = UserProxyAgent(
name="proxy",
human_input_mode="NEVER",
code_execution_config=False,
)
This is the basic AutoGen pattern: AssistantAgent generates the reasoning, while UserProxyAgent coordinates the interaction. For healthcare workflows, keep the system message strict so the model stays inside policy boundaries.
2) Add a policy lookup tool
Underwriting should not depend on model memory. Use a real function that fetches policy text from your internal store, then expose it to the agent as a callable tool.
import os
from typing import Dict
POLICIES: Dict[str, str] = {
"diabetes_standard": (
"Applicants with Type 2 diabetes are eligible if HbA1c <= 8.5 "
"and no hospitalization in last 12 months."
),
"hypertension_standard": (
"Applicants with controlled hypertension are eligible if BP < 140/90 "
"and no end-organ damage."
),
}
def get_policy(policy_id: str) -> str:
return POLICIES.get(policy_id, "Policy not found.")
underwriter.register_function(
function_map={
"get_policy": get_policy
}
)
In production, get_policy should query your approved document store or vector index with access controls. Do not hardcode policies unless you are prototyping.
3) Run an underwriting case through AutoGen
Now send structured case data to the agent and require a recommendation format that downstream systems can parse.
case = {
"member_id": "M12345",
"policy_id": "diabetes_standard",
"age": 46,
"conditions": ["Type 2 diabetes"],
"hba1c": 8.1,
"hospitalizations_last_12m": 0,
}
prompt = f"""
Evaluate this healthcare underwriting case using only the relevant policy.
Case:
{case}
Instructions:
1. Retrieve the applicable policy.
2. Determine eligibility.
3. Provide:
- decision: approve / decline / refer
- rationale: short explanation
- risk_flags: list of concerns
- compliance_notes: mention any missing data or audit issues
"""
result = proxy.initiate_chat(
underwriter,
message=prompt,
)
print(result.chat_history[-1]["content"])
The important part here is that the agent is forced to work from explicit case data and retrieve policy text instead of improvising. That reduces hallucinated underwriting logic and makes reviews easier.
4) Add a compliance post-check
Healthcare needs guardrails after generation as well. A second pass can validate that no extra PHI leaked into the response and that every adverse outcome has an explanation.
def compliance_check(text: str) -> bool:
banned_terms = ["SSN", "full diagnosis history", "genetic test result"]
return not any(term.lower() in text.lower() for term in banned_terms)
output = result.chat_history[-1]["content"]
if not compliance_check(output):
raise ValueError("Compliance check failed")
print("Approved for human review")
This is not enough on its own, but it gives you a deterministic safety net before routing output to reviewers or downstream systems.
Production Considerations
- •Deploy inside your controlled environment
- •Keep inference in-region if your healthcare contracts require data residency.
- •If you use hosted models, verify where prompts and logs are stored.
- •Log everything needed for audit
- •Persist input case IDs, retrieved policy version, model version, timestamps, tool calls, and final recommendation.
- •Avoid storing raw PHI unless your retention policy explicitly allows it.
- •Add hard guardrails around adverse actions
- •Declines and referrals should always route to a licensed human reviewer.
- •The agent can draft recommendations; it should not be the final decision-maker.
- •Monitor drift by segment
- •Track approval rates by plan type, age band, geography, and condition category.
- •If one segment starts getting unusual decline patterns, inspect prompt changes or policy retrieval failures.
Common Pitfalls
- •Letting the model “reason” without current policy context
- •This causes stale or invented underwriting logic.
- •Fix it by forcing every case through a live retrieval step tied to versioned policies.
- •Mixing PHI into free-form prompts
- •Developers often dump full clinical notes into chat history.
- •Fix it by minimizing input fields and redacting unnecessary identifiers before calling AutoGen.
- •Skipping human review on edge cases
- •A borderline case may look confident but still violate internal guidelines or local regulations.
- •Fix it by routing uncertain cases to
referwhen key fields are missing or thresholds are close.
If you build this pattern correctly, AutoGen becomes an orchestration layer for underwriting workflow automation rather than an uncontrolled decision engine. That’s the right shape for healthcare: deterministic where possible, auditable always, and human-reviewed where required.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit