How to Build a KYC verification Agent Using AutoGen in Python for lending
A KYC verification agent for lending collects borrower identity data, checks it against policy and external sources, and returns a decision package that a loan officer or downstream workflow can trust. It matters because lending teams need fast onboarding without violating compliance rules, and every verification step must be auditable enough for regulators, risk teams, and internal model governance.
Architecture
- •
User-facing intake agent
- •Collects borrower-submitted fields like name, DOB, address, ID number, employer, and consent status.
- •Normalizes the request into a structured case object.
- •
Verification agent
- •Runs KYC checks against internal policy rules.
- •Decides which checks are required based on product type, geography, and risk tier.
- •
Tool layer
- •Wraps deterministic systems: sanctions screening, document OCR, address validation, watchlist lookup, and CRM retrieval.
- •Keeps the LLM away from raw external APIs.
- •
Supervisor / orchestrator
- •Coordinates multi-agent conversation.
- •Ensures the workflow ends with a machine-readable recommendation: approve, review, or reject.
- •
Audit logger
- •Stores prompts, tool calls, outputs, timestamps, and policy decisions.
- •Required for lending compliance reviews and dispute handling.
- •
Human review handoff
- •Escalates ambiguous cases to an analyst.
- •Preserves all evidence so the reviewer does not need to re-run checks.
Implementation
1) Install AutoGen and define your tools
For lending workflows, keep external checks in Python functions and expose them as tools. The agent should call deterministic code for sanctions screening or residency checks instead of inventing results.
pip install pyautogen
from typing import Dict
import json
def check_sanctions(full_name: str) -> Dict:
# Replace with your real sanctions provider integration
hits = ["John Doe"] if full_name.lower() == "john doe" else []
return {"provider": "mock_sanctions", "hits": hits, "match": len(hits) > 0}
def validate_address(country: str, postal_code: str) -> Dict:
supported = country.upper() in {"US", "GB", "CA"}
return {
"country": country,
"postal_code": postal_code,
"valid": supported and len(postal_code) >= 4
}
def assess_kyc_risk(case: Dict) -> Dict:
risk = "high" if case.get("country") in {"IR", "KP"} else "medium"
return {"risk_band": risk}
2) Create the agents with AutoGen’s actual API
Use AssistantAgent for reasoning and UserProxyAgent to execute Python tool functions. This pattern works well because lending systems need tool execution to stay deterministic and observable.
from autogen import AssistantAgent, UserProxyAgent
llm_config = {
"config_list": [
{
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
}
],
"temperature": 0,
}
kyc_agent = AssistantAgent(
name="kyc_verifier",
llm_config=llm_config,
system_message=(
"You are a KYC verification agent for lending. "
"Return only structured JSON with fields: decision, reason, evidence."
),
)
tool_runner = UserProxyAgent(
name="tool_runner",
human_input_mode="NEVER",
code_execution_config=False,
)
3) Register tools and run a single-case verification flow
AutoGen’s register_for_llm and register_for_execution methods let the assistant discover callable tools while the user proxy executes them. This keeps the conversation grounded in actual system output.
import os
from autogen import ConversableAgent
# Expose tools to the assistant
@kyc_agent.register_for_llm(name="check_sanctions", description="Check sanctions list matches")
@tool_runner.register_for_execution(name="check_sanctions")
def _check_sanctions(full_name: str):
return check_sanctions(full_name)
@kyc_agent.register_for_llm(name="validate_address", description="Validate borrower address")
@tool_runner.register_for_execution(name="validate_address")
def _validate_address(country: str, postal_code: str):
return validate_address(country, postal_code)
@kyc_agent.register_for_llm(name="assess_kyc_risk", description="Assess KYC risk band")
@tool_runner.register_for_execution(name="assess_kyc_risk")
def _assess_kyc_risk(case: dict):
return assess_kyc_risk(case)
case = {
"full_name": "Jane Smith",
"country": "US",
"postal_code": "94105",
"loan_product": "personal_loan",
}
result = tool_runner.initiate_chat(
kyc_agent,
message=(
f"Verify this lending KYC case:\n{json.dumps(case)}\n"
"Call the necessary tools and return final JSON."
),
)
print(result.summary)
In production you would parse result.summary, validate it against a schema, then persist it to your audit store. If your process requires explicit multi-step control rather than free-form chat, wrap this in an orchestrator service that calls each tool in sequence and uses the agent only for exception handling or narrative explanation.
4) Add a reviewer handoff path
Lending operations need a clean escalation path when confidence is low or policy is violated. Use the agent to produce a recommendation plus evidence bundle that an analyst can review without re-running everything.
def route_case(decision_payload: dict) -> str:
if decision_payload["decision"] == "review":
return "human_review_queue"
if decision_payload["decision"] == "reject":
return "decline_and_log"
return "approve_and_continue"
# Example downstream routing
payload = {
"decision": "review",
"reason": "Address validation failed; additional proof of residence required.",
}
queue = route_case(payload)
print(queue)
Production Considerations
- •
Keep PII inside controlled boundaries
- •Mask sensitive fields before sending them to the model when possible.
- •For lending data residency requirements, pin inference to approved regions and avoid cross-border prompt logging.
- •
Log every decision artifact
- •Store input payload hash, tool outputs, timestamps, model version, prompt version, and final decision.
- •Regulators will ask why a loan was held or declined; “the model said so” is not enough.
- •
Use deterministic guardrails
- •Hard-fail on sanctions hits or missing consent.
- •Do not let the LLM override policy rules for required checks like ID validity or adverse media thresholds.
- •
Monitor drift by segment
- •Track false positives by geography, product type, branch channel, and customer segment.
- •A KYC agent that works for salaried borrowers in one country may break badly for thin-file applicants elsewhere.
Common Pitfalls
- •
Letting the LLM make compliance decisions directly
- •Fix: use tools for facts and policy code for decisions.
- •The model can explain outcomes; it should not invent sanctions results or waive required checks.
- •
Skipping audit trails
- •Fix: persist raw tool output plus final rationale for every case.
- •In lending reviews you need evidence of consent capture, identity verification steps, and any escalation reason.
- •
Ignoring regional policy differences
- •Fix: parameterize rules by jurisdiction and product.
- •A mortgage file in one market may require different residency proofs than an unsecured personal loan elsewhere.
- •
Using one generic prompt for all borrowers
- •Fix: split flows by risk tier and channel.
- •Digital onboarding with government ID is not the same as branch-assisted applications with manual document uploads.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit