How to Build a KYC verification Agent Using AutoGen in Python for insurance
A KYC verification agent for insurance automates the first pass of customer identity checks: collecting documents, extracting key fields, comparing them against policyholder data, and flagging mismatches for human review. For insurers, this matters because onboarding delays, false approvals, and weak audit trails create compliance risk, fraud exposure, and expensive manual work.
Architecture
- •
Orchestrator agent
- •Coordinates the workflow and decides when to ask for more evidence or escalate.
- •In AutoGen, this is typically a
ConversableAgent.
- •
Document extraction tool
- •Pulls structured data from IDs, proof-of-address docs, and application forms.
- •Keep this outside the LLM when possible; use deterministic parsers or OCR services.
- •
Verification agent
- •Compares extracted fields against policy application data.
- •Checks name consistency, DOB match, address validity, document expiry, and sanctions/watchlist flags.
- •
Compliance reviewer agent
- •Produces a final decision package with reasons.
- •This is where you enforce insurer-specific rules: acceptable doc types, jurisdiction constraints, retention policy.
- •
Human escalation path
- •Handles edge cases like fuzzy matches, missing documents, or high-risk customers.
- •Never let the model auto-approve ambiguous cases.
- •
Audit logger
- •Stores inputs, outputs, timestamps, model decisions, and rule hits.
- •Required for traceability in regulated insurance workflows.
Implementation
1) Install AutoGen and define your agents
Use pyautogen and build a small multi-agent flow. The key pattern is to keep the LLM responsible for reasoning and explanation, while deterministic code handles document parsing and policy checks.
from autogen import AssistantAgent, UserProxyAgent
import os
llm_config = {
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
"temperature": 0,
}
kyc_agent = AssistantAgent(
name="kyc_verifier",
llm_config=llm_config,
system_message=(
"You verify KYC for insurance onboarding. "
"Return only JSON with keys: decision, reasons, missing_items, risk_flags."
),
)
reviewer_agent = AssistantAgent(
name="compliance_reviewer",
llm_config=llm_config,
system_message=(
"You review KYC decisions for insurance compliance. "
"Reject anything with unresolved identity mismatch or missing mandatory docs."
),
)
user_proxy = UserProxyAgent(
name="ops_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=1,
)
2) Add a deterministic verification function
This is where you keep the business rules explicit. Insurance teams care about repeatability more than cleverness.
def verify_kyc_fields(application: dict, extracted: dict) -> dict:
issues = []
if application["full_name"].strip().lower() != extracted["full_name"].strip().lower():
issues.append("name_mismatch")
if application["date_of_birth"] != extracted["date_of_birth"]:
issues.append("dob_mismatch")
if extracted.get("document_expired", False):
issues.append("expired_document")
if not extracted.get("proof_of_address"):
issues.append("missing_proof_of_address")
return {
"decision": "reject" if issues else "pass",
"risk_flags": issues,
"missing_items": [] if extracted.get("proof_of_address") else ["proof_of_address"],
"reasons": issues or ["all_fields_match"],
}
3) Let AutoGen generate the explanation and escalation decision
Use initiate_chat() to pass structured input to the verifier. Then feed the result into a compliance reviewer. This keeps the workflow auditable and easy to test.
application = {
"full_name": "Alicia Mensah",
"date_of_birth": "1990-08-14",
"country": "ZA",
}
extracted_doc = {
"full_name": "Alicia Mensah",
"date_of_birth": "1990-08-14",
"document_expired": False,
"proof_of_address": True,
}
rule_result = verify_kyc_fields(application, extracted_doc)
prompt = f"""
Application:
{application}
Document extraction:
{extracted_doc}
Rule result:
{rule_result}
Produce a concise JSON decision for insurance onboarding.
"""
kyc_response = user_proxy.initiate_chat(
kyc_agent,
message=prompt,
)
review_prompt = f"""
Review this KYC outcome for insurance compliance:
{rule_result}
If there is any unresolved mismatch or missing mandatory item,
return reject with a short reason.
"""
review_response = user_proxy.initiate_chat(
reviewer_agent,
message=review_prompt,
)
4) Wrap it in a production-friendly service boundary
The agent should not directly touch raw documents in your API handler. Parse files first, normalize fields, then call the workflow.
def run_kyc_workflow(application: dict, extracted_doc: dict) -> dict:
rule_result = verify_kyc_fields(application, extracted_doc)
prompt = (
f"Application={application}\n"
f"Extraction={extracted_doc}\n"
f"Rules={rule_result}\n"
f"Return JSON only."
)
kyc_chat = user_proxy.initiate_chat(kyc_agent, message=prompt)
return {
"rule_result": rule_result,
"agent_output": kyc_chat.chat_history[-1]["content"],
"final_status": rule_result["decision"],
}
Production Considerations
- •
Data residency
- •Keep PII inside approved regions. If your insurer operates across jurisdictions, route EU/UK customer data to compliant endpoints only.
- •Do not send raw ID images to external services unless legal review has cleared that path.
- •
Auditability
- •Persist every decision input: application payload hash, extracted fields, model output, reviewer output, timestamp.
- •Store why the case passed or failed in plain language that compliance teams can read later.
- •
Guardrails
- •Hard-fail on expired documents, unsupported countries, sanctions hits, or mismatched DOBs.
- •Use LLMs for explanation and triage; use deterministic rules for approval boundaries.
- •
Monitoring
- •Track rejection rate by country, document type failure rate, manual review rate, and average time-to-decision.
- •Sudden shifts usually mean OCR drift, upstream form changes, or model behavior changes.
Common Pitfalls
- •
Letting the model make final approval decisions
- •Don’t do this for regulated onboarding.
- •Use explicit rules for pass/fail thresholds and reserve the model for reasoning plus summarization.
- •
Passing raw unnormalized data into the agent
- •Name casing differences, date formats, and OCR noise will create false mismatches.
- •Normalize names to lowercase trimmed strings and standardize dates before comparison.
- •
Ignoring jurisdiction-specific compliance rules
- •Insurance KYC in one market is not portable to another.
- •Encode country-level requirements: acceptable ID types, proof-of-address age limits, retention periods, and escalation thresholds.
If you build it this way, you get a KYC agent that is useful in production: deterministic where it must be strict, agentic where it needs judgment, and traceable enough for insurance compliance reviews.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit