How to Build a KYC verification Agent Using LangGraph in Python for insurance

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlanggraphpythoninsurance

A KYC verification agent for insurance collects identity data, checks it against policy and compliance rules, and routes the case to approve, reject, or escalate for manual review. It matters because insurers need to onboard customers quickly without breaking AML, sanctions, and fraud controls, and every decision has to be auditable.

Architecture

  • Input intake node

    • Receives applicant data: name, DOB, address, ID number, policy type, and jurisdiction.
    • Normalizes the payload before any checks run.
  • Document verification node

    • Validates uploaded identity documents.
    • Extracts fields and compares them with the application data.
  • Compliance rules node

    • Checks sanction lists, PEP flags, age restrictions, and country-specific insurance requirements.
    • Applies deterministic rules before any LLM-driven reasoning.
  • Risk scoring node

    • Produces a KYC risk score from document quality, mismatch count, geography, and prior fraud signals.
    • Decides whether the case is straight-through or needs escalation.
  • Human review route

    • Sends borderline cases to an underwriter or compliance analyst.
    • Preserves all evidence for audit trails.
  • Audit log sink

    • Stores every state transition, model output, rule hit, and final decision.
    • Needed for regulator review and internal investigations.

Implementation

1. Define the state model

Use a typed state object so every node reads and writes the same structure. For insurance workflows, keep both the raw applicant data and the decision metadata in state.

from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, START, END
import operator

class KYCState(TypedDict):
    applicant: dict
    extracted: dict
    compliance_flags: list[str]
    risk_score: int
    decision: str
    audit_log: Annotated[list[str], operator.add]

This pattern matters because LangGraph merges state updates across nodes. The Annotated[list[str], operator.add] field lets each node append audit events without overwriting earlier entries.

2. Build deterministic verification nodes

Keep the first pass rule-based. In insurance KYC, you want obvious failures caught before you spend tokens or call external services.

def normalize_input(state: KYCState):
    app = state["applicant"]
    normalized = {
        "full_name": app["full_name"].strip().title(),
        "dob": app["dob"],
        "country": app["country"].upper(),
        "id_number": app["id_number"].replace(" ", ""),
    }
    return {
        "applicant": normalized,
        "audit_log": [f"Normalized applicant for {normalized['country']}"]
    }

def check_compliance(state: KYCState):
    flags = []
    app = state["applicant"]

    sanctioned_countries = {"IR", "KP", "SY"}
    if app["country"] in sanctioned_countries:
        flags.append("sanctions_country")

    if len(app["id_number"]) < 8:
        flags.append("invalid_id_format")

    return {
        "compliance_flags": flags,
        "audit_log": [f"Compliance flags: {flags or ['none']}"]
    }

def score_risk(state: KYCState):
    score = 0
    score += 70 if "sanctions_country" in state["compliance_flags"] else 0
    score += 30 if "invalid_id_format" in state["compliance_flags"] else 0
    score += 10 if state["applicant"]["country"] not in {"US", "GB", "CA"} else 0

    return {
        "risk_score": min(score, 100),
        "audit_log": [f"Risk score computed: {min(score, 100)}"]
    }

3. Add routing for approve/review/reject

LangGraph shines when you need explicit branching. Use add_conditional_edges so your decision path is visible in code and in logs.

def decide_route(state: KYCState) -> Literal["approve", "review", "reject"]:
    if state["risk_score"] >= 80:
        return "reject"
    if state["risk_score"] >= 30:
        return "review"
    return "approve"

def approve_case(state: KYCState):
    return {
        "decision": "approved",
        "audit_log": ["Case approved automatically"]
    }

def reject_case(state: KYCState):
    return {
        "decision": "rejected",
        "audit_log": ["Case rejected automatically"]
    }

def manual_review(state: KYCState):
    # In production this would create a task in your underwriting/compliance queue.
    return {
        "decision": "manual_review",
        "audit_log": ["Case sent to human reviewer"]
    }

Now wire it into a StateGraph.

graph = StateGraph(KYCState)

graph.add_node("normalize_input", normalize_input)
graph.add_node("check_compliance", check_compliance)
graph.add_node("score_risk", score_risk)
graph.add_node("approve_case", approve_case)
graph.add_node("reject_case", reject_case)
graph.add_node("manual_review", manual_review)

graph.add_edge(START, "normalize_input")
graph.add_edge("normalize_input", "check_compliance")
graph.add_edge("check_compliance", "score_risk")

graph.add_conditional_edges(
    "score_risk",
    decide_route,
    {
        "approve": "approve_case",
        "review": "manual_review",
        "reject": "reject_case",
    },
)

graph.add_edge("approve_case", END)
graph.add_edge("reject_case", END)
graph.add_edge("manual_review", END)

kyc_app = graph.compile()

Run it with an applicant payload:

result = kyc_app.invoke({
    "applicant": {
        "full_name": "jane doe",
        "dob": "1992-04-11",
        "country": "US",
        "id_number": "A12345678"
    },
    "extracted": {},
    "compliance_flags": [],
    "risk_score": 0,
    "decision": "",
    # start with an empty list; LangGraph will merge appended logs
    # because of Annotated[list[str], operator.add]
})
print(result["decision"])
print(result["audit_log"])

4. Extend with real document checks

In production you usually add OCR or document-extraction before compliance scoring. Keep that node isolated so you can swap vendors without touching routing logic.

A common pattern is:

  • OCR node calls AWS Textract / Azure Document Intelligence / Google Document AI.
  • Extraction result gets written into state["extracted"].
  • Compliance node compares extracted fields to application fields.
  • Mismatch count feeds risk scoring.

That separation keeps your graph testable and makes vendor replacement much easier during procurement reviews.

Production Considerations

  • Auditability

    • Persist every audit_log entry with timestamps and graph version.
    • Regulators will ask why a policyholder was rejected; your logs must show the exact rule path.
  • Data residency

    • Keep PII inside approved regions only.
    • If you use external OCR or LLM services, verify where documents are processed and stored before sending passport or driver’s license data.
  • Guardrails

    • Do not let an LLM make final approval decisions on its own.
    • Use deterministic rules for sanctions, age limits, jurisdiction restrictions, and known fraud indicators; reserve models for extraction and summarization.
  • Monitoring

    • Track approval rate, manual review rate, false reject rate, and time-to-decision by country and product line.
    • A spike in one geography often means bad OCR quality or a broken downstream vendor integration.

Common Pitfalls

  1. Putting policy decisions inside free-form LLM output

    • Avoid this by making approval logic deterministic in graph nodes like score_risk and decide_route.
    • Use models for extraction or summarization only.
  2. Overwriting state instead of merging it

    • If you store logs or flags as plain lists without merge semantics, later nodes can erase earlier evidence.
    • Use Annotated[..., operator.add] for append-only fields like audit trails.
  3. Ignoring jurisdiction-specific requirements

    • Insurance KYC is not one-size-fits-all.
    • Separate rule sets by country or line of business so UK motor onboarding does not share the same thresholds as US life insurance onboarding.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides