How to Build a KYC verification Agent Using LangGraph in Python for investment banking

By Cyprian AaronsUpdated 2026-04-21

kyc-verificationlanggraphpythoninvestment-banking

A KYC verification agent in investment banking collects client data, checks it against policy and external sources, flags risk, and routes exceptions to humans. It matters because onboarding speed directly affects revenue, but weak KYC creates regulatory exposure, audit failures, and downstream AML problems.

Architecture

•
Input intake layer
- •Accepts client profile data, corporate documents, beneficial ownership details, and jurisdiction metadata.
- •Normalizes raw input into a structured state object.
•
Document extraction and validation
- •Pulls fields from passports, certificates of incorporation, proof of address, and ownership registers.
- •Validates completeness, expiry dates, and document consistency.
•
Risk and sanctions screening
- •Checks names, entities, UBOs, and countries against sanctions lists, PEP lists, adverse media feeds, and internal watchlists.
- •Produces explicit risk flags with evidence.
•
Policy engine
- •Applies bank-specific KYC rules by client type, geography, product line, and entity structure.
- •Decides whether the case is auto-approved, needs remediation, or must be escalated.
•
Human review handoff
- •Routes ambiguous or high-risk cases to compliance analysts.
- •Preserves the full decision trail for audit.
•
Audit and persistence layer
- •Stores state transitions, extracted evidence, model outputs, and final decisions.
- •Supports retention requirements and regulator review.

Implementation

1) Define the state model

For production KYC flows, keep the graph state explicit. You want every node to read and write a typed payload so you can inspect decisions later.

from typing import TypedDict, List, Optional
from langgraph.graph import StateGraph, END

class KYCState(TypedDict):
    customer_id: str
    jurisdiction: str
    entity_type: str
    documents: List[str]
    extracted_fields: dict
    screening_hits: list
    risk_score: int
    decision: Optional[str]
    notes: List[str]

This state is small on purpose. In real deployments you would attach document references or blob IDs instead of raw files to avoid moving sensitive data through every node.

2) Build deterministic nodes for extraction, screening, and policy

Use deterministic functions where possible. For investment banking workflows, the agent should not “reason” its way around compliance rules.

def extract_documents(state: KYCState) -> KYCState:
    docs = state["documents"]
    extracted = {
        "name": "Acme Holdings Ltd",
        "registration_number": "12345678",
        "incorporation_country": "GB",
        "ubo_count": 2,
    }
    return {**state, "extracted_fields": extracted}

def screen_customer(state: KYCState) -> KYCState:
    name = state["extracted_fields"].get("name", "")
    hits = []
    if "Acme" in name:
        hits.append({"source": "sanctions", "match": "low_confidence"})
    return {**state, "screening_hits": hits}

def apply_policy(state: KYCState) -> KYCState:
    hits = state.get("screening_hits", [])
    country = state["extracted_fields"].get("incorporation_country")
    
    if hits:
        return {**state, "risk_score": 85}
    if country not in {"GB", "US", "DE", "FR"}:
        return {**state, "risk_score": 70}
    return {**state, "risk_score": 20}

These nodes are intentionally boring. That is what you want in regulated workflows: predictable behavior that compliance can explain line by line.

3) Add routing logic with `add_conditional_edges`

The graph should branch based on risk score. Low-risk cases can auto-complete; medium-risk cases need remediation; high-risk cases go to human review.

def route_case(state: KYCState):
    score = state["risk_score"]
    if score >= 80:
        return "human_review"
    if score >= 50:
        return "remediation"
    return "approve"

def remediation_node(state: KYCState) -> KYCState:
    notes = state.get("notes", [])
    notes.append("Request UBO chart and source-of-funds letter.")
    return {**state, "decision": "pending_remediation", "notes": notes}

def human_review_node(state: KYCState) -> KYCState:
    notes = state.get("notes", [])
    notes.append("Escalated to compliance analyst due to screening hit.")
    return {**state, "decision": "manual_review_required", "notes": notes}

def approve_node(state: KYCState) -> KYCState:
    notes = state.get("notes", [])
    notes.append("KYC passed policy checks.")
    return {**state, "decision": "approved", "notes": notes}

graph = StateGraph(KYCState)
graph.add_node("extract_documents", extract_documents)
graph.add_node("screen_customer", screen_customer)
graph.add_node("apply_policy", apply_policy)
graph.add_node("remediation", remediation_node)
graph.add_node("human_review", human_review_node)
graph.add_node("approve", approve_node)

graph.set_entry_point("extract_documents")
graph.add_edge("extract_documents", "screen_customer")
graph.add_edge("screen_customer", "apply_policy")
graph.add_conditional_edges(
    "apply_policy",
    route_case,
    {
        "remediation": "remediation",
        "human_review": "human_review",
        "approve": END,
    },
)

app = graph.compile()

Notice the use of StateGraph, add_node, add_edge, add_conditional_edges, set_entry_point, and compile(). That is the core LangGraph pattern for regulated orchestration.

4) Run the workflow and persist the result

In a real bank you would attach a checkpointer or external persistence layer. Even without that here, you can execute the graph deterministically and store the final output in your case management system.

initial_state: KYCState = {
    "customer_id": "CUST-10001",
    "jurisdiction": "UK",
    "entity_type": "corporate",
    "documents": ["certificate_of_incorporation.pdf", "ubo_register.pdf"],
    "extracted_fields": {},
    "screening_hits": [],
    "risk_score": 0,
    "decision": None,
      # keep analyst-visible commentary here
     ,
}

result = app.invoke(initial_state)
print(result["decision"])
print(result["notes"])

If you want resumability for analyst workflows or long-running onboarding cases, wire in a LangGraph checkpointer so an interrupted case can resume without losing state. That matters when a client uploads documents across multiple days or when an analyst needs to add evidence before re-running the flow.

Production Considerations

•
Data residency
- •Keep PII inside your approved region. If your bank operates under UK/EU constraints or local banking secrecy laws, do not send raw customer data to non-compliant infrastructure.
- •Store only references in graph state where possible; fetch sensitive artifacts from region-bound storage at node execution time.
•
Auditability
- •Log every node transition with timestamp, input hash, output hash, and rule version.
- •Regulators care about why a case was approved or escalated. Preserve the exact screening source version and policy snapshot used at decision time.
•
Guardrails
- •Never let an LLM make the final approval decision on its own.
- •Use LLMs for extraction summaries or analyst assistance only; keep sanctions matching and policy enforcement deterministic.
•
Monitoring
- •Track false positives on screening hits, average time-to-decision by jurisdiction, manual-review rate by entity type, and remediation completion rate.
- •A spike in manual reviews often means your extraction quality dropped or a policy rule changed without proper rollout control.

Common Pitfalls

•
Using free-form agent reasoning for compliance decisions
- •Don’t ask the model “should we onboard this client?” as an open-ended question.
- •Split extraction from policy enforcement. The model can assist with field normalization; the rule engine decides outcome.
•
Storing raw sensitive documents in graph state
- •This bloats memory usage and increases exposure during debugging.
- •Store document IDs or signed URLs instead. Retrieve content inside controlled services with access logging.
•
Skipping exception paths
- •Many teams build only the happy path: approve or reject.
- •Real investment banking onboarding needs remediation loops for missing UBO data, expired documents, mismatched addresses, and enhanced due diligence triggers.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit