How to Build a KYC verification Agent Using LangGraph in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21

kyc-verificationlanggraphpythonpension-funds

A KYC verification agent for pension funds checks whether a member, beneficiary, employer, or intermediary has submitted enough valid identity and compliance evidence to onboard or update records. It matters because pension funds sit under strict regulatory scrutiny: bad KYC creates fraud risk, blocked contributions, failed benefit payments, audit findings, and expensive remediation.

Architecture

•
Document intake layer
- •Accepts PDFs, scans, ID images, proof of address, tax forms, and employer onboarding files.
- •Normalizes inputs into text and structured metadata before the agent reasons over them.
•
Extraction and classification node
- •Uses an LLM to identify document type, entity type, jurisdiction, and missing fields.
- •Separates member KYC from beneficiary KYC and employer/introducer KYC.
•
Policy rules engine
- •
  Applies pension-fund-specific checks:
  - •mandatory identity fields
  - •sanctions/PEP screening result
  - •address recency
  - •tax residency flags
  - •trustee-approved exceptions
- •Produces deterministic pass/fail outcomes where possible.
•
Human review branch
- •Routes borderline cases to compliance analysts.
- •Captures reviewer decisions for audit trails and model improvement.
•
Audit store
- •Persists every decision, extracted field, rule hit, and reviewer override.
- •Must be immutable or append-only for regulator review.
•
Integration layer
- •Pushes verified KYC status into the pension administration system.
- •Emits events for downstream workflows like contribution activation or benefit payout approval.

Implementation

1) Define the state model and graph nodes

Use StateGraph with a typed state object. Keep the state explicit so every transition is inspectable during audits.

from typing import TypedDict, List, Optional
from langgraph.graph import StateGraph, START, END

class KYCState(TypedDict):
    raw_text: str
    doc_type: Optional[str]
    entity_type: Optional[str]
    extracted_fields: dict
    missing_fields: List[str]
    risk_score: int
    decision: Optional[str]
    reviewer_notes: Optional[str]

def classify_document(state: KYCState) -> KYCState:
    text = state["raw_text"].lower()
    if "passport" in text or "national id" in text:
        doc_type = "identity_document"
    elif "proof of address" in text or "utility bill" in text:
        doc_type = "address_proof"
    else:
        doc_type = "other"

    return {**state, "doc_type": doc_type}

def extract_fields(state: KYCState) -> KYCState:
    fields = {}
    text = state["raw_text"]

    if state["doc_type"] == "identity_document":
        fields["name"] = "Extracted Name"
        fields["id_number"] = "Extracted ID"
        fields["country"] = "Extracted Country"

    missing = [k for k in ["name", "id_number", "country"] if k not in fields]
    return {**state, "extracted_fields": fields, "missing_fields": missing}

def assess_risk(state: KYCState) -> KYCState:
    score = 0
    if state["doc_type"] == "other":
        score += 40
    score += len(state["missing_fields"]) * 20
    return {**state, "risk_score": score}

2) Add deterministic routing for auto-approve vs review

For pension funds you want hard thresholds. Don’t let the model make final decisions when policy can do it.

def decide_route(state: KYCState) -> str:
    if state["risk_score"] >= 40:
        return "manual_review"
    return "auto_approve"

def auto_approve(state: KYCState) -> KYCState:
    return {**state, "decision": "approved"}

def manual_review(state: KYCState) -> KYCState:
    # In production this would create a case in your compliance queue.
    return {
        **state,
        "decision": "needs_review",
        "reviewer_notes": f"Missing fields: {', '.join(state['missing_fields'])}"
    }

3) Wire the graph with `add_conditional_edges`

This is the actual LangGraph pattern you’ll use for branching workflows.

builder = StateGraph(KYCState)

builder.add_node("classify_document", classify_document)
builder.add_node("extract_fields", extract_fields)
builder.add_node("assess_risk", assess_risk)
builder.add_node("auto_approve", auto_approve)
builder.add_node("manual_review", manual_review)

builder.add_edge(START, "classify_document")
builder.add_edge("classify_document", "extract_fields")
builder.add_edge("extract_fields", "assess_risk")

builder.add_conditional_edges(
    "assess_risk",
    decide_route,
    {
        "auto_approve": "auto_approve",
        "manual_review": "manual_review",
    },
)

builder.add_edge("auto_approve", END)
builder.add_edge("manual_review", END)

graph = builder.compile()

4) Run the agent with a real input payload

You can invoke the compiled graph synchronously. For production systems, wrap this behind an API endpoint or queue consumer.

initial_state = {
    "raw_text": """
        Pension fund onboarding document.
        Passport copy attached.
        Proof of address included.
        Member name appears on page one.
        """,
    "doc_type": None,
    "entity_type": None,
    "extracted_fields": {},
    "missing_fields": [],
    "risk_score": 0,
    "decision": None,
    "reviewer_notes": None,
}

result = graph.invoke(initial_state)
print(result["decision"])
print(result["risk_score"])
print(result.get("reviewer_notes"))

If you want LLM extraction instead of placeholder logic, replace extract_fields with a node that calls your model client. Keep the routing logic outside the model so your compliance thresholds stay deterministic.

Production Considerations

•
Deployment and data residency
- •Keep document processing inside the jurisdiction required by your pension fund policy.
- •If member data must stay on-prem or in-region cloud, do not send raw PII to external endpoints without approval.
•
Auditability
- •Persist every input document hash, extracted field set, rule outcome, and human override.
- •Use immutable logs so internal audit and regulators can reconstruct why a case was approved or escalated.
•
Guardrails
- •Hard-code policy checks for sanctions hits, expired IDs, missing proof of address windows, and unverifiable beneficiaries.
- •Never allow an LLM to override a failed mandatory control without explicit human approval.
•
Monitoring
- •Track approval rate by jurisdiction, review rate by document type, and false-positive rates from compliance analysts.
- •Alert when one country or employer suddenly spikes in manual reviews; that often indicates template drift or fraud attempts.

Common Pitfalls

•
Letting the model decide final compliance outcomes
- •Fix this by keeping approvals rule-based and using the LLM only for extraction and classification.
•
Skipping entity-specific logic
- •A pension fund has different checks for members, beneficiaries, employers, advisers, and trustees.
- •Model these separately or you’ll approve incomplete files with the wrong acceptance criteria.
•
Not storing an audit trail
- •If you cannot show what was extracted, what rule failed, and who overrode it, you will struggle during audits.
- •Log graph inputs/outputs at each node and retain versioned policy rules alongside them.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit