How to Build a KYC verification Agent Using LangGraph in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlanggraphpythonretail-banking

A KYC verification agent automates the first pass of customer onboarding: it collects identity data, checks document completeness, validates against policy rules, and routes edge cases to a human reviewer. For retail banking, that matters because onboarding speed, compliance accuracy, and auditability all sit on the same path; if you get one wrong, you create fraud exposure, regulatory risk, or abandoned applications.

Architecture

  • Input normalization node

    • Takes raw application data from web forms, mobile apps, or branch systems.
    • Normalizes names, addresses, dates of birth, and document metadata into a canonical schema.
  • Document validation node

    • Checks whether required KYC artifacts are present.
    • Verifies file types, expiry dates, image quality flags, and basic consistency between fields.
  • Policy/rules engine node

    • Applies bank-specific KYC policy.
    • Flags missing fields, sanctions screening triggers, PEP indicators, mismatched identities, and residency constraints.
  • LLM reasoning node

    • Summarizes issues in plain language for internal ops teams.
    • Produces structured decision support, not final compliance decisions.
  • Human review routing node

    • Sends high-risk or ambiguous cases to an analyst queue.
    • Keeps low-risk cases moving automatically with a full audit trail.
  • Audit persistence layer

    • Stores every state transition, rule result, and reviewer action.
    • Supports regulatory review, model governance, and incident investigations.

Implementation

1) Define the state your graph will carry

For KYC workflows, keep the state explicit. You want a typed object that captures inputs, findings, risk flags, and the final decision path.

from typing import TypedDict, List, Optional
from langgraph.graph import StateGraph, START, END

class KYCState(TypedDict):
    applicant_id: str
    full_name: str
    date_of_birth: str
    country_of_residence: str
    documents: List[str]
    normalized_name: Optional[str]
    missing_fields: List[str]
    risk_flags: List[str]
    decision: Optional[str]
    analyst_notes: Optional[str]

2) Add deterministic nodes for validation and policy checks

Do not start with an LLM. Start with rules. Banking KYC needs predictable behavior for missing docs and policy violations.

def normalize_input(state: KYCState) -> KYCState:
    state["normalized_name"] = " ".join(state["full_name"].strip().lower().split())
    return state

def validate_documents(state: KYCState) -> KYCState:
    missing = []
    required = ["passport", "proof_of_address"]
    for doc in required:
        if doc not in state["documents"]:
            missing.append(doc)
    state["missing_fields"] = missing
    return state

def apply_kyc_policy(state: KYCState) -> KYCState:
    flags = []

    if state["country_of_residence"] in {"IR", "KP", "SY"}:
        flags.append("restricted_jurisdiction")

    if state["missing_fields"]:
        flags.append("incomplete_kyc")

    if len(state["full_name"]) < 3:
        flags.append("invalid_name")

    state["risk_flags"] = flags
    return state

3) Route based on risk using LangGraph conditional edges

This is where LangGraph fits well. The graph stays deterministic until you need escalation logic.

def route_case(state: KYCState) -> str:
    if "restricted_jurisdiction" in state["risk_flags"]:
        return "manual_review"
    if "incomplete_kyc" in state["risk_flags"]:
        return "manual_review"
    return "auto_approve"

def manual_review(state: KYCState) -> KYCState:
    state["decision"] = "pending_manual_review"
    state["analyst_notes"] = (
        f"KYC review required for {state['applicant_id']}: "
        f"{', '.join(state['risk_flags'])}"
    )
    return state

def auto_approve(state: KYCState) -> KYCState:
    state["decision"] = "approved"
    return state

graph = StateGraph(KYCState)

graph.add_node("normalize_input", normalize_input)
graph.add_node("validate_documents", validate_documents)
graph.add_node("apply_kyc_policy", apply_kyc_policy)
graph.add_node("manual_review", manual_review)
graph.add_node("auto_approve", auto_approve)

graph.add_edge(START, "normalize_input")
graph.add_edge("normalize_input", "validate_documents")
graph.add_edge("validate_documents", "apply_kyc_policy")

graph.add_conditional_edges(
    "apply_kyc_policy",
    route_case,
    {
        "manual_review": "manual_review",
        "auto_approve": "auto_approve",
    },
)

graph.add_edge("manual_review", END)
graph.add_edge("auto_approve", END)

app = graph.compile()

4) Run the agent with real applicant data

The compiled graph behaves like a normal runnable. In production you would wrap this with authz checks, persistence hooks, and PII controls before execution.

initial_state: KYCState = {
    "applicant_id": "CUST-100245",
    "full_name": "Jane Doe",
    "date_of_birth": "1991-04-12",
    "country_of_residence": "GB",
    "documents": ["passport"],
    "normalized_name": None,
    "missing_fields": [],
    "risk_flags": [],
    "decision": None,
    "analyst_notes": None,
}

result = app.invoke(initial_state)

print(result["decision"])
print(result["risk_flags"])
print(result["analyst_notes"])

If you want to persist intermediate states for auditability, use LangGraph’s checkpointing pattern with a checkpointer when compiling. That gives you traceable transitions across retries and human handoffs without rebuilding your own workflow engine.

Production Considerations

  • Use a strict PII boundary

    • Keep passport numbers, national IDs, and address data out of prompts unless absolutely necessary.
    • Mask sensitive fields before any LLM call and store raw values only in approved banking systems.
  • Add audit-grade persistence

    • Persist every node input/output with timestamps and operator identity.
    • Regulators care about why a case was approved or escalated; your graph should make that answer easy to reconstruct.
  • Respect data residency

    • Route EU customer data to EU-hosted infrastructure.
    • If your bank has regional processing constraints, enforce them at the orchestration layer before the graph runs.
  • Treat the LLM as advisory only

    • Let deterministic rules make eligibility decisions.
    • Use the model for summarization or analyst assistance; never let it override sanctions logic or mandatory policy checks.

Common Pitfalls

  • Using the LLM as the first decision point

    • Bad pattern: ask the model whether a customer is “KYC compliant.”
    • Fix it by running deterministic validation first and only using the model for explanation or triage text.
  • Not separating policy from orchestration

    • If your business rules are buried inside node code with no versioning, audits become painful.
    • Put jurisdiction rules and threshold logic in config-backed modules so compliance can review changes independently.
  • Ignoring human-in-the-loop design

    • A banking agent that only returns approve/reject will fail on edge cases.
    • Add explicit pending_manual_review states so analysts can resolve ambiguous cases without breaking the workflow history.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides