How to Build a KYC verification Agent Using LangGraph in Python for fintech

By Cyprian AaronsUpdated 2026-04-21

kyc-verificationlanggraphpythonfintech

A KYC verification agent takes a customer application, checks identity documents, extracts and validates key fields, screens for risk signals, and returns a decision with an audit trail. For fintech, this matters because onboarding speed is directly tied to conversion, but every automated decision still has to survive compliance review, explainability requirements, and regulator scrutiny.

Architecture

A production KYC agent in LangGraph usually needs these components:

•
Input normalization
- •Accepts raw applicant payloads: name, DOB, address, document images, and metadata.
- •Converts them into a consistent internal state before any checks run.
•
Document extraction node
- •Uses OCR or a document parser to extract passport/ID fields.
- •Produces structured data like document number, expiry date, and issuing country.
•
Validation node
- •Compares extracted document data against user-provided data.
- •Checks format rules, expiry status, and field consistency.
•
Risk screening node
- •Runs sanctions/PEP checks, fraud heuristics, and country-risk logic.
- •Produces a risk score plus reasons for escalation.
•
Decision node
- •Applies deterministic policy: approve, reject, or manual review.
- •Keeps the final decision explainable and auditable.
•
Audit logging layer
- •Persists inputs, intermediate outputs, model/tool calls, and final outcome.
- •This is non-negotiable in fintech.

Implementation

1) Define the state model

Use TypedDict for graph state so every node knows exactly what it can read and write. Keep the state small; don’t shove raw images or PII blobs through every edge if you can store them separately and reference them by ID.

from typing import TypedDict, Optional, List
from langgraph.graph import StateGraph, START, END

class KYCState(TypedDict):
    applicant_id: str
    full_name: str
    date_of_birth: str
    document_type: str
    document_number: Optional[str]
    document_country: Optional[str]
    document_expiry: Optional[str]
    extracted_name: Optional[str]
    extracted_dob: Optional[str]
    risk_score: int
    flags: List[str]
    decision: Optional[str]

2) Build deterministic nodes for extraction, validation, and screening

For fintech KYC, keep the core decision path deterministic where possible. If you use an LLM anywhere in the workflow, constrain it to extraction or explanation generation; do not let it make the final compliance decision.

def extract_document(state: KYCState) -> dict:
    # Replace this with OCR / doc parser output from a real service.
    return {
        "document_number": "P12345678",
        "document_country": "KE",
        "document_expiry": "2028-04-01",
        "extracted_name": state["full_name"],
        "extracted_dob": state["date_of_birth"],
    }

def validate_identity(state: KYCState) -> dict:
    flags = []
    if state["full_name"].strip().lower() != (state.get("extracted_name") or "").strip().lower():
        flags.append("NAME_MISMATCH")
    if state["date_of_birth"] != state.get("extracted_dob"):
        flags.append("DOB_MISMATCH")
    if state.get("document_expiry") and state["document_expiry"] < "2026-04-21":
        flags.append("DOCUMENT_EXPIRED")
    return {"flags": flags}

def screen_risk(state: KYCState) -> dict:
    flags = list(state.get("flags", []))
    risk_score = len(flags) * 40
    if state.get("document_country") in {"IR", "KP", "SY"}:
        flags.append("HIGH_RISK_JURISDICTION")
        risk_score += 30
    return {"risk_score": min(risk_score, 100), "flags": flags}

3) Add a policy node and conditional routing with LangGraph

This is where LangGraph is useful. You define explicit branches with add_conditional_edges, which gives you a clean audit trail of why a request went to approval or manual review.

def decide(state: KYCState) -> dict:
    if any(flag in state["flags"] for flag in ["NAME_MISMATCH", "DOB_MISMATCH", "DOCUMENT_EXPIRED"]):
        return {"decision": "manual_review"}
    if state["risk_score"] >= 50:
        return {"decision": "manual_review"}
    return {"decision": "approved"}

def route_after_decision(state: KYCState) -> str:
    return state["decision"] or "manual_review"

graph = StateGraph(KYCState)

graph.add_node("extract_document", extract_document)
graph.add_node("validate_identity", validate_identity)
graph.add_node("screen_risk", screen_risk)
graph.add_node("decide", decide)

graph.add_edge(START, "extract_document")
graph.add_edge("extract_document", "validate_identity")
graph.add_edge("validate_identity", "screen_risk")
graph.add_edge("screen_risk", "decide")

graph.add_conditional_edges(
    "decide",
    route_after_decision,
    {
        "approved": END,
        "manual_review": END,
        "rejected": END,
    },
)

kyc_app = graph.compile()

4) Run the graph and persist the output

In production you want the result plus the intermediate evidence. That means storing the full execution trace or at least the input/output of each node alongside a case ID.

initial_state = {
    "applicant_id": "app_001",
    "full_name": "Jane Doe",
    "date_of_birth": "1992-08-12",
    "document_type": "passport",
    "document_number": None,
    "document_country": None,
    "document_expiry": None,
    "extracted_name": None,
    "extracted_dob": None,
    "risk_score": 0,
    "flags": [],
    "decision": None,
}

result = kyc_app.invoke(initial_state)

print(result["decision"])
print(result["risk_score"])
print(result["flags"])

If you need more control over execution timing or streaming intermediate events into an audit log pipeline, use stream() on the compiled app instead of invoke(). That makes it easier to emit step-by-step evidence into your case management system.

Production Considerations

•
Data residency
- •Keep PII inside approved regions only.
- •If your OCR or LLM provider processes data cross-border, that can become a compliance issue fast.
- •Use region-pinned storage and region-specific deployments for customer records.
•
Auditability
- •Log every node input/output with timestamps and immutable case IDs.
- •Store policy version alongside each decision so you can explain why a case was approved last week but rejected today.
- •In regulated environments, “the model said so” is not acceptable evidence.
•
Guardrails
- •Hard-code rejection rules for expired documents, mismatched identity fields, sanctioned jurisdictions, and missing mandatory data.
- •Do not allow free-form LLM outputs to directly set decision.
- •Use LLMs only for extraction support or human-readable summaries.
•
Monitoring
- •Track manual-review rate, false positives on screening rules, OCR failure rate, and average time-to-decision.
- •Watch drift by geography and document type; fraud patterns are rarely uniform.
- •Alert on spikes in high-risk jurisdiction matches or repeated retries from the same device fingerprint.

Common Pitfalls

•
Letting the LLM make compliance decisions
- •Bad pattern: asking an LLM “approve or reject this applicant.”
- •Fix it by making decisions deterministic inside LangGraph nodes like decide(), with explicit thresholds and rule checks.
•
Passing raw sensitive data through every node
- •This increases blast radius and complicates retention policies.
- •Pass references or extracted fields only when possible.
- •Store images and documents in secure object storage with short-lived access tokens.
•
Skipping versioning on policy logic
- •A KYC workflow without versioned rules is impossible to defend during audits.
- •Version your graph code, screening lists, thresholds, and vendor integrations together.
- •Persist those versions with each case so reviewers can reconstruct the exact decision path later.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit