How to Build a KYC verification Agent Using LangGraph in Python for retail banking
A KYC verification agent automates the first pass of customer onboarding: it collects identity data, checks document completeness, validates against policy rules, and routes edge cases to a human reviewer. For retail banking, that matters because onboarding speed, compliance accuracy, and auditability all sit on the same path; if you get one wrong, you create fraud exposure, regulatory risk, or abandoned applications.
Architecture
- •
Input normalization node
- •Takes raw application data from web forms, mobile apps, or branch systems.
- •Normalizes names, addresses, dates of birth, and document metadata into a canonical schema.
- •
Document validation node
- •Checks whether required KYC artifacts are present.
- •Verifies file types, expiry dates, image quality flags, and basic consistency between fields.
- •
Policy/rules engine node
- •Applies bank-specific KYC policy.
- •Flags missing fields, sanctions screening triggers, PEP indicators, mismatched identities, and residency constraints.
- •
LLM reasoning node
- •Summarizes issues in plain language for internal ops teams.
- •Produces structured decision support, not final compliance decisions.
- •
Human review routing node
- •Sends high-risk or ambiguous cases to an analyst queue.
- •Keeps low-risk cases moving automatically with a full audit trail.
- •
Audit persistence layer
- •Stores every state transition, rule result, and reviewer action.
- •Supports regulatory review, model governance, and incident investigations.
Implementation
1) Define the state your graph will carry
For KYC workflows, keep the state explicit. You want a typed object that captures inputs, findings, risk flags, and the final decision path.
from typing import TypedDict, List, Optional
from langgraph.graph import StateGraph, START, END
class KYCState(TypedDict):
applicant_id: str
full_name: str
date_of_birth: str
country_of_residence: str
documents: List[str]
normalized_name: Optional[str]
missing_fields: List[str]
risk_flags: List[str]
decision: Optional[str]
analyst_notes: Optional[str]
2) Add deterministic nodes for validation and policy checks
Do not start with an LLM. Start with rules. Banking KYC needs predictable behavior for missing docs and policy violations.
def normalize_input(state: KYCState) -> KYCState:
state["normalized_name"] = " ".join(state["full_name"].strip().lower().split())
return state
def validate_documents(state: KYCState) -> KYCState:
missing = []
required = ["passport", "proof_of_address"]
for doc in required:
if doc not in state["documents"]:
missing.append(doc)
state["missing_fields"] = missing
return state
def apply_kyc_policy(state: KYCState) -> KYCState:
flags = []
if state["country_of_residence"] in {"IR", "KP", "SY"}:
flags.append("restricted_jurisdiction")
if state["missing_fields"]:
flags.append("incomplete_kyc")
if len(state["full_name"]) < 3:
flags.append("invalid_name")
state["risk_flags"] = flags
return state
3) Route based on risk using LangGraph conditional edges
This is where LangGraph fits well. The graph stays deterministic until you need escalation logic.
def route_case(state: KYCState) -> str:
if "restricted_jurisdiction" in state["risk_flags"]:
return "manual_review"
if "incomplete_kyc" in state["risk_flags"]:
return "manual_review"
return "auto_approve"
def manual_review(state: KYCState) -> KYCState:
state["decision"] = "pending_manual_review"
state["analyst_notes"] = (
f"KYC review required for {state['applicant_id']}: "
f"{', '.join(state['risk_flags'])}"
)
return state
def auto_approve(state: KYCState) -> KYCState:
state["decision"] = "approved"
return state
graph = StateGraph(KYCState)
graph.add_node("normalize_input", normalize_input)
graph.add_node("validate_documents", validate_documents)
graph.add_node("apply_kyc_policy", apply_kyc_policy)
graph.add_node("manual_review", manual_review)
graph.add_node("auto_approve", auto_approve)
graph.add_edge(START, "normalize_input")
graph.add_edge("normalize_input", "validate_documents")
graph.add_edge("validate_documents", "apply_kyc_policy")
graph.add_conditional_edges(
"apply_kyc_policy",
route_case,
{
"manual_review": "manual_review",
"auto_approve": "auto_approve",
},
)
graph.add_edge("manual_review", END)
graph.add_edge("auto_approve", END)
app = graph.compile()
4) Run the agent with real applicant data
The compiled graph behaves like a normal runnable. In production you would wrap this with authz checks, persistence hooks, and PII controls before execution.
initial_state: KYCState = {
"applicant_id": "CUST-100245",
"full_name": "Jane Doe",
"date_of_birth": "1991-04-12",
"country_of_residence": "GB",
"documents": ["passport"],
"normalized_name": None,
"missing_fields": [],
"risk_flags": [],
"decision": None,
"analyst_notes": None,
}
result = app.invoke(initial_state)
print(result["decision"])
print(result["risk_flags"])
print(result["analyst_notes"])
If you want to persist intermediate states for auditability, use LangGraph’s checkpointing pattern with a checkpointer when compiling. That gives you traceable transitions across retries and human handoffs without rebuilding your own workflow engine.
Production Considerations
- •
Use a strict PII boundary
- •Keep passport numbers, national IDs, and address data out of prompts unless absolutely necessary.
- •Mask sensitive fields before any LLM call and store raw values only in approved banking systems.
- •
Add audit-grade persistence
- •Persist every node input/output with timestamps and operator identity.
- •Regulators care about why a case was approved or escalated; your graph should make that answer easy to reconstruct.
- •
Respect data residency
- •Route EU customer data to EU-hosted infrastructure.
- •If your bank has regional processing constraints, enforce them at the orchestration layer before the graph runs.
- •
Treat the LLM as advisory only
- •Let deterministic rules make eligibility decisions.
- •Use the model for summarization or analyst assistance; never let it override sanctions logic or mandatory policy checks.
Common Pitfalls
- •
Using the LLM as the first decision point
- •Bad pattern: ask the model whether a customer is “KYC compliant.”
- •Fix it by running deterministic validation first and only using the model for explanation or triage text.
- •
Not separating policy from orchestration
- •If your business rules are buried inside node code with no versioning, audits become painful.
- •Put jurisdiction rules and threshold logic in config-backed modules so compliance can review changes independently.
- •
Ignoring human-in-the-loop design
- •A banking agent that only returns approve/reject will fail on edge cases.
- •Add explicit
pending_manual_reviewstates so analysts can resolve ambiguous cases without breaking the workflow history.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit