How to Build a KYC verification Agent Using LangGraph in Python for lending

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlanggraphpythonlending

A KYC verification agent for lending takes an applicant’s identity data, checks it against internal and external sources, flags risk, and decides whether the file is complete enough to move forward. For lenders, this matters because bad KYC creates compliance exposure, slows underwriting, and increases fraud losses.

Architecture

A production KYC agent for lending usually needs these components:

  • Input normalizer

    • Cleans applicant payloads from web forms, CRM systems, or loan origination platforms.
    • Maps fields like full_name, dob, address, document_id, and country.
  • Document verification node

    • Checks ID document presence, expiry, format, and image quality.
    • Can call OCR or document validation services.
  • Sanctions / watchlist screening node

    • Runs the applicant through PEP, sanctions, and adverse media checks.
    • Returns match confidence and escalation flags.
  • Risk decision node

    • Applies lending policy rules.
    • Decides approved, needs_review, or rejected.
  • Audit logger

    • Persists every decision, input hash, tool result, and policy reason.
    • This is non-negotiable for lending audits.
  • Human review handoff

    • Routes edge cases to an analyst queue when confidence is low or a match is ambiguous.

Implementation

1. Define the state and the graph nodes

LangGraph works well here because KYC is not a single LLM call. It is a controlled workflow with branching logic and explicit state transitions.

from typing import TypedDict, Literal, Optional
from langgraph.graph import StateGraph, START, END

class KYCState(TypedDict):
    applicant_id: str
    full_name: str
    dob: str
    country: str
    document_id: str
    doc_valid: bool
    sanctions_match: bool
    risk_score: int
    decision: Literal["approved", "needs_review", "rejected"]
    audit_reason: Optional[str]

def normalize_input(state: KYCState) -> KYCState:
    state["full_name"] = state["full_name"].strip().title()
    state["country"] = state["country"].strip().upper()
    return state

def verify_document(state: KYCState) -> KYCState:
    # Replace with OCR / document API call in production
    state["doc_valid"] = len(state["document_id"]) >= 8
    return state

def screen_sanctions(state: KYCState) -> KYCState:
    # Replace with real screening provider integration
    blocked_names = {"John Doe", "Test Sanctioned"}
    state["sanctions_match"] = state["full_name"] in blocked_names
    return state

def score_risk(state: KYCState) -> KYCState:
    score = 0
    if not state["doc_valid"]:
        score += 60
    if state["sanctions_match"]:
        score += 100
    if state["country"] not in {"US", "CA", "GB", "AU"}:
        score += 15
    state["risk_score"] = score
    return state

def decide(state: KYCState) -> KYCState:
    if state["sanctions_match"]:
        state["decision"] = "rejected"
        state["audit_reason"] = "Sanctions/watchlist match"
    elif not state["doc_valid"]:
        state["decision"] = "needs_review"
        state["audit_reason"] = "Invalid or missing identity document"
    elif state["risk_score"] >= 20:
        state["decision"] = "needs_review"
        state["audit_reason"] = "Policy threshold exceeded"
    else:
        state["decision"] = "approved"
        state["audit_reason"] = "KYC passed"
    return state

graph = StateGraph(KYCState)
graph.add_node("normalize_input", normalize_input)
graph.add_node("verify_document", verify_document)
graph.add_node("screen_sanctions", screen_sanctions)
graph.add_node("score_risk", score_risk)
graph.add_node("decide", decide)

graph.add_edge(START, "normalize_input")
graph.add_edge("normalize_input", "verify_document")
graph.add_edge("verify_document", "screen_sanctions")
graph.add_edge("screen_sanctions", "score_risk")
graph.add_edge("score_risk", "decide")
graph.add_edge("decide", END)

kyc_app = graph.compile()

This pattern keeps each compliance step isolated. That makes it easier to test, audit, and swap implementations without rewriting the workflow.

2. Add branching for manual review

In lending, you do not want every borderline case to be auto-rejected. You want deterministic routing into review when confidence is low or policy requires escalation.

from langgraph.graph import StateGraph, START, END

def route_decision(state: KYCState) -> str:
    return state["decision"]

review_graph = StateGraph(KYCState)
review_graph.add_node("normalize_input", normalize_input)
review_graph.add_node("verify_document", verify_document)
review_graph.add_node("screen_sanctions", screen_sanctions)
review_graph.add_node("score_risk", score_risk)
review_graph.add_node("decide", decide)

review_graph.add_edge(START, "normalize_input")
review_graph.add_edge("normalize_input", "verify_document")
review_graph.add_edge("verify_document", "screen_sanctions")
review_graph.add_edge("screen_sanctions", "score_risk")
review_graph.add_edge("score_risk", "decide")

review_graph.add_conditional_edges(
    "decide",
    route_decision,
    {
        "approved": END,
        "needs_review": END,
        "rejected": END,
    },
)

kyc_app_with_branching = review_graph.compile()

The key point is that the graph still ends in a deterministic output. The downstream loan system can consume decision without guessing what happened inside the agent.

3. Run it with lending-specific input

Use structured payloads from your loan application pipeline. Keep PII minimal in memory and store only what you need for audit.

input_state: KYCState = {
    "applicant_id": "app_10291",
    "full_name": " jane smith ",
    "dob": "1991-04-12",
    "country": "us",
    "document_id": "DL-88392011",
}

result = kyc_app.invoke(input_state)

print(result["decision"])
print(result["audit_reason"])
print(result["risk_score"])

For a lender, this output should be written into your case management system along with timestamps, model/version metadata, and source references for any external check.

Production Considerations

  • Keep the workflow deterministic where possible

    • Use LangGraph for orchestration and only use LLMs where they add value.
    • For sanctions screening and document validation, prefer rule-based or vendor APIs over free-form generation.
  • Log every decision path

    • Store node outputs, branch decisions, and final reason codes.
    • Auditors will ask why an applicant was rejected or escalated.
  • Respect data residency

    • If you operate across regions, pin execution and storage to approved jurisdictions.
    • Do not send raw PII to services outside your regulatory boundary without a legal basis.
  • Add guardrails around human escalation

    • Any sanctions hit or ambiguous identity match should go to manual review.
    • Never let an LLM override hard compliance rules in lending.

Common Pitfalls

  1. Using an LLM as the final decision maker

    • Mistake: letting the model approve or reject applicants directly.
    • Avoid it by making the LLM assist with extraction or summarization only; keep policy decisions in code.
  2. Not preserving audit evidence

    • Mistake: storing only the final decision.
    • Avoid it by recording node-level outputs, rule thresholds, vendor response IDs, and a reason code for every outcome.
  3. Ignoring jurisdiction-specific compliance rules

    • Mistake: applying one global policy to all applicants.
    • Avoid it by parameterizing rules per country or product line. Lending compliance changes with geography, product type, and customer segment.

If you build the agent this way, you get a workflow that is explainable enough for compliance teams and strict enough for production lending systems. That is the bar.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides