How to Build a KYC verification Agent Using LangGraph in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlanggraphpythonpension-funds

A KYC verification agent for pension funds checks whether a member, beneficiary, employer, or intermediary has submitted enough valid identity and compliance evidence to onboard or update records. It matters because pension funds sit under strict regulatory scrutiny: bad KYC creates fraud risk, blocked contributions, failed benefit payments, audit findings, and expensive remediation.

Architecture

  • Document intake layer

    • Accepts PDFs, scans, ID images, proof of address, tax forms, and employer onboarding files.
    • Normalizes inputs into text and structured metadata before the agent reasons over them.
  • Extraction and classification node

    • Uses an LLM to identify document type, entity type, jurisdiction, and missing fields.
    • Separates member KYC from beneficiary KYC and employer/introducer KYC.
  • Policy rules engine

    • Applies pension-fund-specific checks:
      • mandatory identity fields
      • sanctions/PEP screening result
      • address recency
      • tax residency flags
      • trustee-approved exceptions
    • Produces deterministic pass/fail outcomes where possible.
  • Human review branch

    • Routes borderline cases to compliance analysts.
    • Captures reviewer decisions for audit trails and model improvement.
  • Audit store

    • Persists every decision, extracted field, rule hit, and reviewer override.
    • Must be immutable or append-only for regulator review.
  • Integration layer

    • Pushes verified KYC status into the pension administration system.
    • Emits events for downstream workflows like contribution activation or benefit payout approval.

Implementation

1) Define the state model and graph nodes

Use StateGraph with a typed state object. Keep the state explicit so every transition is inspectable during audits.

from typing import TypedDict, List, Optional
from langgraph.graph import StateGraph, START, END

class KYCState(TypedDict):
    raw_text: str
    doc_type: Optional[str]
    entity_type: Optional[str]
    extracted_fields: dict
    missing_fields: List[str]
    risk_score: int
    decision: Optional[str]
    reviewer_notes: Optional[str]

def classify_document(state: KYCState) -> KYCState:
    text = state["raw_text"].lower()
    if "passport" in text or "national id" in text:
        doc_type = "identity_document"
    elif "proof of address" in text or "utility bill" in text:
        doc_type = "address_proof"
    else:
        doc_type = "other"

    return {**state, "doc_type": doc_type}

def extract_fields(state: KYCState) -> KYCState:
    fields = {}
    text = state["raw_text"]

    if state["doc_type"] == "identity_document":
        fields["name"] = "Extracted Name"
        fields["id_number"] = "Extracted ID"
        fields["country"] = "Extracted Country"

    missing = [k for k in ["name", "id_number", "country"] if k not in fields]
    return {**state, "extracted_fields": fields, "missing_fields": missing}

def assess_risk(state: KYCState) -> KYCState:
    score = 0
    if state["doc_type"] == "other":
        score += 40
    score += len(state["missing_fields"]) * 20
    return {**state, "risk_score": score}

2) Add deterministic routing for auto-approve vs review

For pension funds you want hard thresholds. Don’t let the model make final decisions when policy can do it.

def decide_route(state: KYCState) -> str:
    if state["risk_score"] >= 40:
        return "manual_review"
    return "auto_approve"

def auto_approve(state: KYCState) -> KYCState:
    return {**state, "decision": "approved"}

def manual_review(state: KYCState) -> KYCState:
    # In production this would create a case in your compliance queue.
    return {
        **state,
        "decision": "needs_review",
        "reviewer_notes": f"Missing fields: {', '.join(state['missing_fields'])}"
    }

3) Wire the graph with add_conditional_edges

This is the actual LangGraph pattern you’ll use for branching workflows.

builder = StateGraph(KYCState)

builder.add_node("classify_document", classify_document)
builder.add_node("extract_fields", extract_fields)
builder.add_node("assess_risk", assess_risk)
builder.add_node("auto_approve", auto_approve)
builder.add_node("manual_review", manual_review)

builder.add_edge(START, "classify_document")
builder.add_edge("classify_document", "extract_fields")
builder.add_edge("extract_fields", "assess_risk")

builder.add_conditional_edges(
    "assess_risk",
    decide_route,
    {
        "auto_approve": "auto_approve",
        "manual_review": "manual_review",
    },
)

builder.add_edge("auto_approve", END)
builder.add_edge("manual_review", END)

graph = builder.compile()

4) Run the agent with a real input payload

You can invoke the compiled graph synchronously. For production systems, wrap this behind an API endpoint or queue consumer.

initial_state = {
    "raw_text": """
        Pension fund onboarding document.
        Passport copy attached.
        Proof of address included.
        Member name appears on page one.
        """,
    "doc_type": None,
    "entity_type": None,
    "extracted_fields": {},
    "missing_fields": [],
    "risk_score": 0,
    "decision": None,
    "reviewer_notes": None,
}

result = graph.invoke(initial_state)
print(result["decision"])
print(result["risk_score"])
print(result.get("reviewer_notes"))

If you want LLM extraction instead of placeholder logic, replace extract_fields with a node that calls your model client. Keep the routing logic outside the model so your compliance thresholds stay deterministic.

Production Considerations

  • Deployment and data residency

    • Keep document processing inside the jurisdiction required by your pension fund policy.
    • If member data must stay on-prem or in-region cloud, do not send raw PII to external endpoints without approval.
  • Auditability

    • Persist every input document hash, extracted field set, rule outcome, and human override.
    • Use immutable logs so internal audit and regulators can reconstruct why a case was approved or escalated.
  • Guardrails

    • Hard-code policy checks for sanctions hits, expired IDs, missing proof of address windows, and unverifiable beneficiaries.
    • Never allow an LLM to override a failed mandatory control without explicit human approval.
  • Monitoring

    • Track approval rate by jurisdiction, review rate by document type, and false-positive rates from compliance analysts.
    • Alert when one country or employer suddenly spikes in manual reviews; that often indicates template drift or fraud attempts.

Common Pitfalls

  1. Letting the model decide final compliance outcomes

    • Fix this by keeping approvals rule-based and using the LLM only for extraction and classification.
  2. Skipping entity-specific logic

    • A pension fund has different checks for members, beneficiaries, employers, advisers, and trustees.
    • Model these separately or you’ll approve incomplete files with the wrong acceptance criteria.
  3. Not storing an audit trail

    • If you cannot show what was extracted, what rule failed, and who overrode it, you will struggle during audits.
    • Log graph inputs/outputs at each node and retain versioned policy rules alongside them.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides