How to Build a KYC verification Agent Using AutoGen in Python for banking

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationautogenpythonbanking

A KYC verification agent automates the first pass of customer due diligence: it collects identity data, checks documents, flags mismatches, and routes suspicious cases for human review. In banking, that matters because KYC is not just an ops cost; it is a regulatory control tied to AML, sanctions screening, fraud prevention, and auditability.

Architecture

A production KYC agent in AutoGen needs these components:

  • Customer intake interface
    • Receives structured customer data, uploaded document metadata, and verification request context.
  • Document extraction layer
    • Pulls fields from passports, national IDs, proof of address, and business registration documents.
  • Verification agent
    • Compares extracted fields against customer profile data and policy rules.
  • Compliance policy engine
    • Encodes bank-specific thresholds for name mismatch, document expiry, jurisdiction restrictions, and escalation criteria.
  • Human review handoff
    • Routes ambiguous or high-risk cases to an analyst with a full audit trail.
  • Audit logging and evidence store
    • Persists prompts, model outputs, rule decisions, timestamps, and references to source documents.

Implementation

1) Install AutoGen and define the agents

For this pattern, use AutoGen’s AssistantAgent for reasoning and a UserProxyAgent for tool execution and orchestration. In banking workflows, keep the model from directly “deciding” the final outcome; it should recommend a result that your policy layer can accept or override.

from autogen import AssistantAgent, UserProxyAgent
import os
import json

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
    "temperature": 0,
}

kyc_analyst = AssistantAgent(
    name="kyc_analyst",
    llm_config=llm_config,
    system_message=(
        "You are a KYC verification assistant for a bank. "
        "Compare customer profile data with extracted document data. "
        "Return only valid JSON with fields: status, reasons, risk_flags, "
        "missing_fields, recommended_action."
    ),
)

orchestrator = UserProxyAgent(
    name="orchestrator",
    human_input_mode="NEVER",
    code_execution_config=False,
)

2) Add a strict policy check before any decision leaves the workflow

This is where you enforce compliance rules. The LLM can summarize discrepancies, but the bank’s policy layer decides whether the case is auto-approved or escalated.

def kyc_policy_check(customer_profile: dict, extracted_doc: dict) -> dict:
    issues = []
    
    if customer_profile.get("full_name", "").strip().lower() != extracted_doc.get("full_name", "").strip().lower():
        issues.append("NAME_MISMATCH")
    
    if extracted_doc.get("document_expired") is True:
        issues.append("DOCUMENT_EXPIRED")
    
    if not extracted_doc.get("document_number"):
        issues.append("MISSING_DOCUMENT_NUMBER")
    
    if customer_profile.get("country") in {"IR", "KP"}:
        issues.append("RESTRICTED_JURISDICTION")

    return {
        "escalate": len(issues) > 0,
        "issues": issues,
    }

3) Run the AutoGen conversation on structured evidence

The key pattern is to pass only the minimum necessary data into the agent. For banking systems, avoid dumping raw PII into long chat histories; send redacted or normalized fields where possible.

customer_profile = {
    "full_name": "Jane Doe",
    "date_of_birth": "1990-04-11",
    "country": "GB",
}

extracted_doc = {
    "full_name": "Jane Doe",
    "date_of_birth": "1990-04-11",
    "document_number": "P1234567",
    "document_expired": False,
}

policy_result = kyc_policy_check(customer_profile, extracted_doc)

task = f"""
Customer profile:
{json.dumps(customer_profile)}

Extracted document:
{json.dumps(extracted_doc)}

Policy check:
{json.dumps(policy_result)}

Determine whether this KYC case should be APPROVE or ESCALATE.
Return JSON only.
"""

result = orchestrator.initiate_chat(
    kyc_analyst,
    message=task,
)

print(result.chat_history[-1]["content"])

4) Parse the model output and route the case

Do not trust free-form text. Parse JSON strictly and route based on your internal controls. If parsing fails or the output is incomplete, default to escalation.

def route_kyc_result(raw_output: str) -> dict:
    try:
        payload = json.loads(raw_output)
    except json.JSONDecodeError:
        return {"status": "ESCALATE", "reason": "INVALID_JSON"}

    status = payload.get("status", "").upper()
    if status not in {"APPROVE", "ESCALATE"}:
        return {"status": "ESCALATE", "reason": "INVALID_STATUS"}

    return payload

raw_output = result.chat_history[-1]["content"]
final_decision = route_kyc_result(raw_output)

if final_decision["status"] == "APPROVE":
    print("KYC approved")
else:
    print("Send to compliance analyst:", final_decision)

Production Considerations

  • Keep PII inside controlled boundaries

    • Redact passport numbers, addresses, and DOB where possible before sending data to the model.
    • Store raw evidence in your regulated systems; store only references in agent logs.
  • Enforce residency and vendor controls

    • If your bank requires EU/UK data residency, pin inference to approved regions and approved model endpoints.
    • Make sure your AutoGen deployment path matches internal third-party risk reviews.
  • Log every decision path

    • Persist prompt version, model version, policy version, input hashes, output JSON, and reviewer action.
    • That audit trail matters when compliance asks why a case was auto-approved or escalated.
  • Add deterministic guardrails

    • Use hard rules for sanctions-country blocks, expired documents, missing mandatory fields, and age thresholds.
    • Let the LLM explain discrepancies; do not let it override regulatory rules.

Common Pitfalls

  1. Letting the model make final compliance decisions

    • Avoid this by making the agent advisory only.
    • Final approval should come from your policy engine or a human analyst for borderline cases.
  2. Sending too much raw customer data into chat history

    • Avoid this by normalizing inputs to structured fields and redacting unnecessary PII.
    • The agent needs enough context to compare fields; it does not need full document images in prompt text.
  3. Skipping strict output validation

    • Avoid this by requiring JSON-only responses and validating schema before routing.
    • If parsing fails or required keys are missing، escalate immediately instead of guessing.
  4. Ignoring audit requirements

    • Avoid this by storing prompt versions, outputs, timestamps, reviewer overrides, and source document IDs.
    • In banking KYC workflows without traceability، you do not have a defensible control.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides