How to Build a compliance checking Agent Using AutoGen in Python for healthcare
A compliance checking agent for healthcare reviews text, policies, and workflow steps against rules like HIPAA, internal security controls, and data residency constraints. It matters because healthcare teams move fast, but one wrong disclosure of PHI, one unapproved vendor call, or one cross-border data transfer can turn into a reportable incident.
Architecture
- •Policy corpus
- •Store HIPAA policies, internal SOPs, retention rules, and regional residency requirements in a versioned document store.
- •Rules engine
- •Keep deterministic checks outside the LLM where possible: PHI detection, allowed destinations, required approvals, and audit logging.
- •AutoGen agent layer
- •Use an
AssistantAgentto interpret the request and aUserProxyAgentto execute tool calls and enforce human approval on risky actions.
- •Use an
- •Compliance tools
- •Expose Python functions for policy lookup, PHI scanning, residency validation, and audit event creation.
- •Audit trail
- •Persist every prompt, tool call, decision, and policy version used for traceability.
- •Human escalation path
- •Route ambiguous or high-risk cases to a compliance reviewer instead of letting the model decide alone.
Implementation
1) Install and configure AutoGen
Use the current AutoGen package and wire up a model client. For production healthcare systems, keep model configuration explicit so you can control region, vendor, and logging.
pip install pyautogen
import os
from autogen import AssistantAgent, UserProxyAgent
llm_config = {
"config_list": [
{
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
}
],
"temperature": 0,
}
2) Define compliance tools as normal Python functions
Do not ask the model to “figure out” compliance from scratch. Give it deterministic tools for the checks you care about.
import re
from datetime import datetime
POLICIES = {
"phi_keywords": ["diagnosis", "medical record", "patient id", "ssn", "dob"],
"allowed_regions": ["us-east-1", "us-west-2"],
}
def detect_phi(text: str) -> dict:
hits = [k for k in POLICIES["phi_keywords"] if k.lower() in text.lower()]
return {"contains_phi": len(hits) > 0, "hits": hits}
def check_data_residency(destination_region: str) -> dict:
allowed = destination_region in POLICIES["allowed_regions"]
return {"allowed": allowed, "region": destination_region}
def write_audit_log(action: str, payload: str) -> dict:
record = {
"timestamp": datetime.utcnow().isoformat(),
"action": action,
"payload_preview": payload[:200],
}
# Replace with real sink: SIEM, immutable log store, or database table.
print(record)
return {"ok": True, "record_id": f"audit-{int(datetime.utcnow().timestamp())}"}
3) Register tools with a proxy agent and run the review loop
This is the core pattern: the assistant proposes checks; the proxy executes approved functions; your code decides whether to escalate.
assistant = AssistantAgent(
name="compliance_assistant",
llm_config=llm_config,
system_message=(
"You are a healthcare compliance assistant. "
"Always check for PHI exposure and data residency issues. "
"If anything is ambiguous or high risk, recommend human review."
),
)
user_proxy = UserProxyAgent(
name="compliance_proxy",
human_input_mode="NEVER",
)
user_proxy.register_for_execution(name="detect_phi")(detect_phi)
user_proxy.register_for_execution(name="check_data_residency")(check_data_residency)
user_proxy.register_for_execution(name="write_audit_log")(write_audit_log)
assistant.register_for_llm(name="detect_phi", description="Detect potential PHI in text")(detect_phi)
assistant.register_for_llm(name="check_data_residency", description="Check whether a region is allowed")(check_data_residency)
assistant.register_for_llm(name="write_audit_log", description="Write an immutable audit event")(write_audit_log)
task = """
Review this message for healthcare compliance:
"Please send the patient discharge summary with diagnosis details to our analytics team in eu-central-1."
Return whether it is compliant and why.
"""
chat_result = user_proxy.initiate_chat(assistant, message=task)
print(chat_result.summary)
The pattern above works because the agent is not making final compliance decisions in isolation. It is orchestrating deterministic checks and producing a reasoned recommendation that your application can gate on.
4) Add a policy gate before any outbound action
For healthcare workflows, never let an agent directly send data externally without a final gate. Use the result of tool calls plus your own business logic.
def decide_compliance(text: str, destination_region: str) -> dict:
phi_result = detect_phi(text)
residency_result = check_data_residency(destination_region)
audit_result = write_audit_log("compliance_check", text)
compliant = (not phi_result["contains_phi"]) and residency_result["allowed"]
return {
"compliant": compliant,
"phi_result": phi_result,
"residency_result": residency_result,
"audit_result": audit_result,
"requires_human_review": not compliant,
}
result = decide_compliance(
text="Please send the patient discharge summary with diagnosis details.",
destination_region="eu-central-1",
)
print(result)
Production Considerations
- •Deployment boundaries
- •Keep the agent inside your healthcare cloud boundary or VPC. If prompts may contain PHI, do not route them through unmanaged logs or external observability tools.
- •Monitoring
- •Track false positives on PHI detection, escalation rates, tool-call failures, and policy-version drift. Alert when compliance decisions change after policy updates.
- •Guardrails
- •Enforce allowlisted tools only. The agent should never get raw access to databases or email systems; wrap all side effects behind audited functions.
- •Data residency
- •Pin model endpoints and storage regions to approved jurisdictions. If your org requires US-only processing for patient data, make that constraint part of both infra policy and runtime validation.
Common Pitfalls
- •
Letting the LLM decide compliance without deterministic checks
- •Avoid this by using code for PHI detection, region validation, approval routing, and audit logging. The model should explain decisions, not invent them.
- •
Logging sensitive content into traces
- •Don’t dump full prompts or tool payloads into general logs. Redact identifiers like names, MRNs, DOBs, addresses, and free-text clinical notes before persistence.
- •
Skipping human review for edge cases
- •If a request involves external sharing of clinical data, cross-border transfer, research use cases, or de-identification claims, force escalation. In healthcare compliance work, ambiguity is not a green light.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit