How to Build a KYC verification Agent Using LangGraph in Python for pension funds
A KYC verification agent for pension funds checks whether a member, beneficiary, employer, or intermediary has submitted enough valid identity and compliance evidence to onboard or update records. It matters because pension funds sit under strict regulatory scrutiny: bad KYC creates fraud risk, blocked contributions, failed benefit payments, audit findings, and expensive remediation.
Architecture
- •
Document intake layer
- •Accepts PDFs, scans, ID images, proof of address, tax forms, and employer onboarding files.
- •Normalizes inputs into text and structured metadata before the agent reasons over them.
- •
Extraction and classification node
- •Uses an LLM to identify document type, entity type, jurisdiction, and missing fields.
- •Separates member KYC from beneficiary KYC and employer/introducer KYC.
- •
Policy rules engine
- •Applies pension-fund-specific checks:
- •mandatory identity fields
- •sanctions/PEP screening result
- •address recency
- •tax residency flags
- •trustee-approved exceptions
- •Produces deterministic pass/fail outcomes where possible.
- •Applies pension-fund-specific checks:
- •
Human review branch
- •Routes borderline cases to compliance analysts.
- •Captures reviewer decisions for audit trails and model improvement.
- •
Audit store
- •Persists every decision, extracted field, rule hit, and reviewer override.
- •Must be immutable or append-only for regulator review.
- •
Integration layer
- •Pushes verified KYC status into the pension administration system.
- •Emits events for downstream workflows like contribution activation or benefit payout approval.
Implementation
1) Define the state model and graph nodes
Use StateGraph with a typed state object. Keep the state explicit so every transition is inspectable during audits.
from typing import TypedDict, List, Optional
from langgraph.graph import StateGraph, START, END
class KYCState(TypedDict):
raw_text: str
doc_type: Optional[str]
entity_type: Optional[str]
extracted_fields: dict
missing_fields: List[str]
risk_score: int
decision: Optional[str]
reviewer_notes: Optional[str]
def classify_document(state: KYCState) -> KYCState:
text = state["raw_text"].lower()
if "passport" in text or "national id" in text:
doc_type = "identity_document"
elif "proof of address" in text or "utility bill" in text:
doc_type = "address_proof"
else:
doc_type = "other"
return {**state, "doc_type": doc_type}
def extract_fields(state: KYCState) -> KYCState:
fields = {}
text = state["raw_text"]
if state["doc_type"] == "identity_document":
fields["name"] = "Extracted Name"
fields["id_number"] = "Extracted ID"
fields["country"] = "Extracted Country"
missing = [k for k in ["name", "id_number", "country"] if k not in fields]
return {**state, "extracted_fields": fields, "missing_fields": missing}
def assess_risk(state: KYCState) -> KYCState:
score = 0
if state["doc_type"] == "other":
score += 40
score += len(state["missing_fields"]) * 20
return {**state, "risk_score": score}
2) Add deterministic routing for auto-approve vs review
For pension funds you want hard thresholds. Don’t let the model make final decisions when policy can do it.
def decide_route(state: KYCState) -> str:
if state["risk_score"] >= 40:
return "manual_review"
return "auto_approve"
def auto_approve(state: KYCState) -> KYCState:
return {**state, "decision": "approved"}
def manual_review(state: KYCState) -> KYCState:
# In production this would create a case in your compliance queue.
return {
**state,
"decision": "needs_review",
"reviewer_notes": f"Missing fields: {', '.join(state['missing_fields'])}"
}
3) Wire the graph with add_conditional_edges
This is the actual LangGraph pattern you’ll use for branching workflows.
builder = StateGraph(KYCState)
builder.add_node("classify_document", classify_document)
builder.add_node("extract_fields", extract_fields)
builder.add_node("assess_risk", assess_risk)
builder.add_node("auto_approve", auto_approve)
builder.add_node("manual_review", manual_review)
builder.add_edge(START, "classify_document")
builder.add_edge("classify_document", "extract_fields")
builder.add_edge("extract_fields", "assess_risk")
builder.add_conditional_edges(
"assess_risk",
decide_route,
{
"auto_approve": "auto_approve",
"manual_review": "manual_review",
},
)
builder.add_edge("auto_approve", END)
builder.add_edge("manual_review", END)
graph = builder.compile()
4) Run the agent with a real input payload
You can invoke the compiled graph synchronously. For production systems, wrap this behind an API endpoint or queue consumer.
initial_state = {
"raw_text": """
Pension fund onboarding document.
Passport copy attached.
Proof of address included.
Member name appears on page one.
""",
"doc_type": None,
"entity_type": None,
"extracted_fields": {},
"missing_fields": [],
"risk_score": 0,
"decision": None,
"reviewer_notes": None,
}
result = graph.invoke(initial_state)
print(result["decision"])
print(result["risk_score"])
print(result.get("reviewer_notes"))
If you want LLM extraction instead of placeholder logic, replace extract_fields with a node that calls your model client. Keep the routing logic outside the model so your compliance thresholds stay deterministic.
Production Considerations
- •
Deployment and data residency
- •Keep document processing inside the jurisdiction required by your pension fund policy.
- •If member data must stay on-prem or in-region cloud, do not send raw PII to external endpoints without approval.
- •
Auditability
- •Persist every input document hash, extracted field set, rule outcome, and human override.
- •Use immutable logs so internal audit and regulators can reconstruct why a case was approved or escalated.
- •
Guardrails
- •Hard-code policy checks for sanctions hits, expired IDs, missing proof of address windows, and unverifiable beneficiaries.
- •Never allow an LLM to override a failed mandatory control without explicit human approval.
- •
Monitoring
- •Track approval rate by jurisdiction, review rate by document type, and false-positive rates from compliance analysts.
- •Alert when one country or employer suddenly spikes in manual reviews; that often indicates template drift or fraud attempts.
Common Pitfalls
- •
Letting the model decide final compliance outcomes
- •Fix this by keeping approvals rule-based and using the LLM only for extraction and classification.
- •
Skipping entity-specific logic
- •A pension fund has different checks for members, beneficiaries, employers, advisers, and trustees.
- •Model these separately or you’ll approve incomplete files with the wrong acceptance criteria.
- •
Not storing an audit trail
- •If you cannot show what was extracted, what rule failed, and who overrode it, you will struggle during audits.
- •Log graph inputs/outputs at each node and retain versioned policy rules alongside them.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit