How to Build a KYC verification Agent Using LangGraph in Python for banking
A KYC verification agent automates the boring but critical parts of customer onboarding: collecting identity data, checking it against policy, flagging missing evidence, and routing edge cases to a human reviewer. In banking, this matters because KYC failures create regulatory exposure, onboarding delays, and audit gaps. The goal is not to “decide” identity on its own; it is to produce a controlled, explainable workflow that can be reviewed and signed off.
Architecture
A production KYC agent in LangGraph usually needs these components:
- •
Input intake node
- •Accepts customer profile data, document metadata, and onboarding context.
- •Normalizes fields like name, DOB, address, country, and document type.
- •
Policy evaluation node
- •Checks required fields against bank-specific KYC rules.
- •Flags missing documents, expired IDs, mismatch conditions, or high-risk jurisdictions.
- •
Document analysis node
- •Extracts structured signals from uploaded files or OCR output.
- •Should return evidence, not just a yes/no decision.
- •
Risk scoring node
- •Assigns a risk level based on geography, PEP/sanctions hits, document quality, and profile inconsistencies.
- •Keeps thresholds configurable by compliance teams.
- •
Decision router
- •Routes to approve, request more information, or escalate to manual review.
- •This is where LangGraph’s conditional edges fit well.
- •
Audit log / state persistence
- •Stores every decision input and output for traceability.
- •Required for compliance review and internal audit.
Implementation
1) Define the graph state
Use a typed state object so every step has predictable inputs and outputs. For banking workflows, keep both the customer payload and the audit trail in state.
from typing import TypedDict, Literal, List, Dict, Any
from langgraph.graph import StateGraph, START, END
class KYCState(TypedDict):
customer: Dict[str, Any]
extracted_docs: Dict[str, Any]
policy_flags: List[str]
risk_score: int
decision: Literal["approve", "manual_review", "request_more_info"]
audit_log: List[str]
2) Implement deterministic nodes
Keep the first version deterministic. You can add LLM assistance later for document summarization or exception handling, but the core decision path should be rule-based and auditable.
def intake_node(state: KYCState) -> KYCState:
customer = state["customer"]
audit = state.get("audit_log", [])
audit.append(f"Intake received for {customer.get('customer_id')}")
return {**state, "audit_log": audit}
def policy_node(state: KYCState) -> KYCState:
customer = state["customer"]
flags = []
required_fields = ["full_name", "date_of_birth", "country", "id_number"]
for field in required_fields:
if not customer.get(field):
flags.append(f"missing_{field}")
if customer.get("country") in {"IR", "KP", "SY"}:
flags.append("high_risk_jurisdiction")
audit = state.get("audit_log", [])
audit.append(f"Policy check produced {len(flags)} flag(s)")
return {**state, "policy_flags": flags, "audit_log": audit}
def risk_node(state: KYCState) -> KYCState:
score = 0
for flag in state.get("policy_flags", []):
if flag.startswith("missing_"):
score += 20
elif flag == "high_risk_jurisdiction":
score += 50
docs = state.get("extracted_docs", {})
if docs.get("document_expired") is True:
score += 30
audit = state.get("audit_log", [])
audit.append(f"Risk score computed as {score}")
return {**state, "risk_score": score, "audit_log": audit}
3) Route decisions with LangGraph conditional edges
This is the part that makes LangGraph useful. You define explicit branching logic instead of burying it inside one large function.
def decide_next(state: KYCState) -> str:
flags = state.get("policy_flags", [])
risk = state.get("risk_score", 0)
if any(flag.startswith("missing_") for flag in flags):
return "request_more_info"
if risk >= 50 or "high_risk_jurisdiction" in flags:
return "manual_review"
return "approve"
def decision_node(state: KYCState) -> KYCState:
decision = decide_next(state)
audit = state.get("audit_log", [])
audit.append(f"Final decision routed to {decision}")
return {**state, "decision": decision, "audit_log": audit}
Now wire the graph:
graph = StateGraph(KYCState)
graph.add_node("intake", intake_node)
graph.add_node("policy_check", policy_node)
graph.add_node("risk_score", risk_node)
graph.add_node("decide", decision_node)
graph.add_edge(START, "intake")
graph.add_edge("intake", "policy_check")
graph.add_edge("policy_check", "risk_score")
graph.add_edge("risk_score", "decide")
graph.add_conditional_edges(
"decide",
lambda state: state["decision"],
{
"approve": END,
"manual_review": END,
"request_more_info": END,
},
)
app = graph.compile()
Run it with a real payload:
result = app.invoke({
"customer": {
"customer_id": "CUST-1001",
"full_name": "Jane Doe",
"date_of_birth": "1990-04-12",
"country": "GB",
"id_number": "GB1234567",
},
"extracted_docs": {
"document_expired": False
},
"policy_flags": [],
"risk_score": 0,
"decision": "manual_review",
"audit_log": [],
})
print(result["decision"])
print(result["audit_log"])
4) Add human review as an explicit branch
For banking workflows you should not auto-resolve ambiguous cases. Use a manual review path with captured reasons and reviewer notes.
You can extend the graph by adding a human_review node later and routing high-risk cases there. The important pattern is that every branch remains visible in code and every transition lands in an auditable terminal state.
Production Considerations
- •
Persist graph runs with full traceability
- •Store input payloads, node outputs, timestamps, and final decisions.
- •Keep immutable logs for audit teams and model risk reviews.
- •
Separate policy from code where possible
- •Put thresholds like jurisdiction lists and risk scores into config or a rules service.
- •Compliance teams need change control without redeploying application code.
- •
Respect data residency
- •Keep PII inside approved regions.
- •If you use external LLMs for extraction or summarization, make sure routing complies with local banking regulations and vendor contracts.
- •
Add guardrails around automation
- •Never let the agent approve high-risk cases without deterministic checks.
- •Enforce human review for sanctions hits, mismatched identity data, expired documents, or incomplete records.
Common Pitfalls
- •
Using an LLM as the final decision-maker
- •Don’t ask a model to “approve” identity on its own.
- •Use it for extraction or summarization only; keep approval logic rule-based and auditable.
- •
Hiding compliance logic inside one node
- •A single monolithic function makes audits painful.
- •Split intake, policy checks, scoring, and routing into separate nodes so each step is inspectable.
- •
Ignoring incomplete-state handling
- •Real onboarding data is messy.
- •Always handle missing fields explicitly and route those cases to
request_more_infoinstead of guessing.
- •
Skipping trace storage
- •If you can’t reconstruct why a case was approved or escalated, you don’t have a banking-grade workflow.
- •Persist
audit_log, input snapshots, and node-level outputs for every run.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit