How to Build a KYC verification Agent Using CrewAI in Python for pension funds
A KYC verification agent for pension funds collects identity evidence, checks it against policy rules, flags missing or inconsistent data, and produces an auditable decision trail for compliance teams. It matters because pension funds handle long-lived member accounts, regulated onboarding, and strict recordkeeping requirements where weak KYC creates fraud risk, regulatory exposure, and downstream payout issues.
Architecture
- •
Intake layer
- •Accepts member-submitted documents and structured form fields.
- •Normalizes names, dates of birth, national IDs, tax IDs, and address data.
- •
Document extraction layer
- •Pulls text from passports, national IDs, utility bills, bank letters, or proof-of-address documents.
- •Uses OCR or document parsing before the agent reasons over the content.
- •
KYC policy engine
- •Applies pension-fund-specific rules:
- •minimum identity match thresholds
- •acceptable document types by jurisdiction
- •expired-document checks
- •sanctions/PEP escalation triggers
- •Keeps the decision logic deterministic where possible.
- •Applies pension-fund-specific rules:
- •
CrewAI agent layer
- •Uses a small set of specialized agents:
- •document analyst
- •compliance reviewer
- •escalation handler
- •Produces a structured verification outcome.
- •Uses a small set of specialized agents:
- •
Audit and evidence store
- •Persists every input, extracted field, decision rationale, and final status.
- •Required for internal audit and regulator review.
- •
Case management integration
- •Sends approved cases to onboarding systems.
- •Routes exceptions to a human compliance queue.
Implementation
1. Install CrewAI and define the KYC task inputs
For production KYC work, keep the agent focused on one job: verify submitted evidence against policy. Feed it structured inputs instead of raw chatty prompts so you can validate and audit every field.
pip install crewai crewai-tools pydantic
from pydantic import BaseModel, Field
from typing import List, Optional
class KYCInput(BaseModel):
full_name: str = Field(..., description="Member full legal name")
date_of_birth: str = Field(..., description="YYYY-MM-DD")
national_id: Optional[str] = None
country_of_residence: str
documents: List[str] = Field(default_factory=list)
pep_hit: bool = False
sanctions_hit: bool = False
class KYCResult(BaseModel):
status: str
risk_level: str
reasons: List[str]
missing_items: List[str]
2. Build the agents with explicit roles
Use separate agents for extraction/review rather than one general-purpose agent. In regulated environments, that separation gives you better traceability when compliance asks why a case was escalated.
from crewai import Agent
document_analyst = Agent(
role="KYC Document Analyst",
goal="Extract identity facts from submitted documents and identify inconsistencies",
backstory=(
"You review pension fund onboarding documents and summarize only verifiable facts."
),
verbose=True,
)
compliance_reviewer = Agent(
role="KYC Compliance Reviewer",
goal="Decide whether the case passes pension fund KYC policy or needs escalation",
backstory=(
"You apply strict KYC rules for pension fund onboarding with audit-ready reasoning."
),
verbose=True,
)
3. Define tasks that produce structured outputs
Use Task objects to keep responsibilities narrow. The first task extracts facts; the second applies policy and returns a clear decision that downstream systems can consume.
from crewai import Task
extract_task = Task(
description=(
"Review the provided member data and documents. "
"Extract verified identity details, note mismatches, expired documents, "
"and any missing required items."
),
expected_output="A concise factual summary of verified identity data and anomalies.",
agent=document_analyst,
)
review_task = Task(
description=(
"Apply pension fund KYC policy to the extracted facts. "
"If sanctions_hit or pep_hit is true, escalate. "
"Otherwise decide approve or reject with reasons."
),
expected_output="A structured KYC decision with status, risk level, reasons, and missing items.",
agent=compliance_reviewer,
)
4. Run the crew and parse the result into your workflow
This is the actual execution pattern with Crew, Process.sequential, and kickoff. In production you would wrap this in an API endpoint or queue consumer.
from crewai import Crew, Process
def run_kyc_verification(kyc_input: KYCInput):
crew = Crew(
agents=[document_analyst, compliance_reviewer],
tasks=[extract_task, review_task],
process=Process.sequential,
verbose=True,
)
prompt = f"""
Member data:
- Full name: {kyc_input.full_name}
- Date of birth: {kyc_input.date_of_birth}
- National ID: {kyc_input.national_id}
- Country of residence: {kyc_input.country_of_residence}
- Documents: {", ".join(kyc_input.documents)}
- PEP hit: {kyc_input.pep_hit}
- Sanctions hit: {kyc_input.sanctions_hit}
Return a clear KYC decision for a pension fund onboarding workflow.
"""
result = crew.kickoff(inputs={"case": prompt})
return result
if __name__ == "__main__":
sample = KYCInput(
full_name="Amina Ndlovu",
date_of_birth="1982-04-11",
national_id="ZA123456789",
country_of_residence="South Africa",
documents=["passport.pdf", "proof_of_address.pdf"],
pep_hit=False,
sanctions_hit=False,
)
output = run_kyc_verification(sample)
print(output)
If you need stronger control over outputs, add a post-processing step that maps the model response into your KYCResult schema before writing to your case system. That keeps your downstream services from depending on free-form text.
Production Considerations
- •
Data residency
- •Keep member PII in-region.
- •If your pension fund operates across jurisdictions, route cases to region-specific model endpoints or self-hosted inference.
- •Never send raw identity documents to tools or services outside approved residency boundaries.
- •
Auditability
- •Persist:
- •input payload hash
- •extracted fields
- •final decision
- •timestamps
- •model version
- •Store reasoning summaries separately from raw PII so auditors can review decisions without broad access.
- •Persist:
- •
Guardrails
- •Hard-fail on sanctions hits and PEP matches.
- •Do not let the agent override deterministic policy checks.
- •Require human review for:
- •expired IDs close to threshold dates
- •address mismatches
- •incomplete proof-of-address sets
- •
Monitoring
- •Track approval rate by country and document type.
- •Alert on spikes in escalations or false rejections.
- •Sample completed cases weekly to catch drift in extraction quality or policy interpretation.
Common Pitfalls
- •
Letting the agent make policy up as it goes
- •Avoid vague prompts like “decide if this looks valid.”
- •Encode hard rules outside the LLM where possible, then let CrewAI explain or summarize them.
- •
Sending raw PDFs straight into reasoning
- •Extract text first with OCR/document parsing.
- •The agent should reason over normalized fields and evidence snippets, not binary files.
- •
Skipping human escalation paths
- •Pension funds need defensible decisions for edge cases.
- •Build an explicit “review required” state for incomplete records, conflicting identifiers, name changes after marriage/divorce, or cross-border residency ambiguity.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit