How to Build a KYC verification Agent Using AutoGen in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationautogenpythonpension-funds

A KYC verification agent for pension funds collects identity data, checks it against policy and regulatory rules, and routes cases for human review when the evidence is incomplete or risky. For pension administrators, the value is simple: faster onboarding, fewer manual checks, and a cleaner audit trail for regulators, trustees, and internal compliance teams.

Architecture

  • User proxy agent

    • Handles the operator or workflow trigger.
    • In AutoGen, this is typically UserProxyAgent, used to start the interaction and execute approved tool calls.
  • KYC analyst agent

    • Extracts identity fields from submitted documents and structures them into a reviewable format.
    • This agent should be constrained to classification and extraction, not free-form decisions.
  • Compliance reviewer agent

    • Applies pension-fund-specific rules:
      • identity completeness
      • sanctions/PEP flags
      • address consistency
      • source-of-funds escalation thresholds
    • Produces a decision summary with reasons.
  • Tool layer

    • Connects to document OCR, sanctions screening, address verification, and case management APIs.
    • Keep these as explicit Python functions so every external call is auditable.
  • Audit logger

    • Persists prompts, tool calls, outputs, timestamps, and reviewer decisions.
    • This matters for pension funds because compliance teams need evidence of how a decision was reached.
  • Human escalation path

    • Sends incomplete or high-risk cases to a compliance officer.
    • No auto-approval for ambiguous identity matches or missing residency documentation.

Implementation

1) Install and configure AutoGen

Use the current AutoGen package and wire your model config explicitly. For production work in regulated environments, avoid implicit defaults.

pip install pyautogen
import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

2) Define the KYC tools as plain Python functions

Keep external integrations outside the model. That gives you deterministic behavior, easier testing, and better auditability.

from typing import Dict

def screen_sanctions(full_name: str) -> Dict:
    # Replace with real vendor API call
    return {"match": False, "source": "mock_screening"}

def verify_address(address: str) -> Dict:
    # Replace with real address verification service
    return {"verified": True, "country": "ZA"}

def log_case(case_id: str, payload: Dict) -> None:
    # Replace with database or SIEM write
    print(f"[AUDIT] {case_id}: {payload}")

3) Build the agents and register tools

The pattern here is: the user proxy orchestrates execution, the assistant extracts/reasons over KYC data, and tools do the actual verification work. Use register_function so tool usage is explicit.

import json
from autogen import AssistantAgent, UserProxyAgent

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
)

kyc_agent = AssistantAgent(
    name="kyc_analyst",
    llm_config=llm_config,
    system_message=(
        "You are a KYC analyst for a pension fund. "
        "Extract identity facts from provided input. "
        "Do not approve cases. "
        "Escalate any missing fields, sanctions hits, or residency mismatches."
    ),
)

compliance_agent = AssistantAgent(
    name="compliance_reviewer",
    llm_config=llm_config,
    system_message=(
        "You review KYC cases for pension funds. "
        "Apply strict compliance rules. "
        "Return APPROVE or ESCALATE with concise reasons."
    ),
)

user_proxy.register_function(
    function_map={
        "screen_sanctions": screen_sanctions,
        "verify_address": verify_address,
        "log_case": log_case,
    }
)

4) Run a case through the workflow

This example shows a practical control flow: extract facts, call tools, then ask for a compliance decision. The output can be written into your case management system.

case = {
    "case_id": "PF-2026-0017",
    "full_name": "Thabo Mokoena",
    "address": "12 Church Street, Cape Town",
    "id_number": "9001011234087",
}

screen_result = screen_sanctions(case["full_name"])
address_result = verify_address(case["address"])

review_payload = {
    **case,
    "sanctions": screen_result,
    "address_check": address_result,
}

log_case(case["case_id"], review_payload)

prompt = f"""
Review this pension fund KYC case:

{json.dumps(review_payload, indent=2)}

Decision rules:
- APPROVE only if no sanctions match and address is verified.
- ESCALATE if any field is missing or inconsistent.
- Mention audit-relevant reasons.
"""

result = user_proxy.initiate_chat(
    compliance_agent,
    message=prompt,
)

print(result.chat_history[-1]["content"])

If you want multi-agent collaboration instead of a single reviewer pass, add a second AssistantAgent that performs document extraction before compliance review. In practice, that separation helps when you need one model focused on structuring data and another focused on policy interpretation.

Production Considerations

  • Data residency

    • Pension fund member data often has jurisdictional constraints.
    • Keep model endpoints and audit logs in-region where required by local regulation or trustee policy.
  • Audit trail

    • Persist every prompt, tool call, response, and final disposition.
    • Store hashes of source documents so reviewers can prove which evidence was used.
  • Guardrails

    • Block auto-approval on:
      • sanctions hits
      • PEP matches
      • mismatched national ID numbers
      • missing proof of address
    • Force human review for anything ambiguous.
  • Monitoring

    • Track false positives on screening tools.
    • Measure escalation rate by country and document type.
    • Alert when the agent starts approving too many borderline cases.

Common Pitfalls

  1. Letting the LLM make final compliance decisions without deterministic checks

    • Fix it by running sanctions screening, ID validation, and address verification in code first.
    • The model should summarize evidence; it should not invent policy outcomes.
  2. Skipping audit logging

    • Fix it by writing every case payload and decision to durable storage.
    • For pension funds, “we think the model approved it” is not defensible during an audit.
  3. Using one generic prompt for all jurisdictions

    • Fix it by separating rules per country or fund entity.
    • Data residency requirements in one market may differ from AML/KYC obligations in another.
  4. Allowing uncontrolled tool access

    • Fix it by registering only approved functions with UserProxyAgent.register_function.
    • Never expose raw database writes or unrestricted HTTP calls to the agent layer.

A pension-fund KYC agent works when it stays narrow: extract facts, check them against policy-backed tools, escalate uncertainty fast. AutoGen gives you the orchestration layer; your production value comes from strict controls around data handling, auditability, and human review.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides