How to Build a KYC verification Agent Using CrewAI in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationcrewaipythonpension-funds

A KYC verification agent for pension funds collects identity evidence, checks it against policy rules, flags missing or inconsistent data, and produces an auditable decision trail for compliance teams. It matters because pension funds handle long-lived member accounts, regulated onboarding, and strict recordkeeping requirements where weak KYC creates fraud risk, regulatory exposure, and downstream payout issues.

Architecture

  • Intake layer

    • Accepts member-submitted documents and structured form fields.
    • Normalizes names, dates of birth, national IDs, tax IDs, and address data.
  • Document extraction layer

    • Pulls text from passports, national IDs, utility bills, bank letters, or proof-of-address documents.
    • Uses OCR or document parsing before the agent reasons over the content.
  • KYC policy engine

    • Applies pension-fund-specific rules:
      • minimum identity match thresholds
      • acceptable document types by jurisdiction
      • expired-document checks
      • sanctions/PEP escalation triggers
    • Keeps the decision logic deterministic where possible.
  • CrewAI agent layer

    • Uses a small set of specialized agents:
      • document analyst
      • compliance reviewer
      • escalation handler
    • Produces a structured verification outcome.
  • Audit and evidence store

    • Persists every input, extracted field, decision rationale, and final status.
    • Required for internal audit and regulator review.
  • Case management integration

    • Sends approved cases to onboarding systems.
    • Routes exceptions to a human compliance queue.

Implementation

1. Install CrewAI and define the KYC task inputs

For production KYC work, keep the agent focused on one job: verify submitted evidence against policy. Feed it structured inputs instead of raw chatty prompts so you can validate and audit every field.

pip install crewai crewai-tools pydantic
from pydantic import BaseModel, Field
from typing import List, Optional

class KYCInput(BaseModel):
    full_name: str = Field(..., description="Member full legal name")
    date_of_birth: str = Field(..., description="YYYY-MM-DD")
    national_id: Optional[str] = None
    country_of_residence: str
    documents: List[str] = Field(default_factory=list)
    pep_hit: bool = False
    sanctions_hit: bool = False

class KYCResult(BaseModel):
    status: str
    risk_level: str
    reasons: List[str]
    missing_items: List[str]

2. Build the agents with explicit roles

Use separate agents for extraction/review rather than one general-purpose agent. In regulated environments, that separation gives you better traceability when compliance asks why a case was escalated.

from crewai import Agent

document_analyst = Agent(
    role="KYC Document Analyst",
    goal="Extract identity facts from submitted documents and identify inconsistencies",
    backstory=(
        "You review pension fund onboarding documents and summarize only verifiable facts."
    ),
    verbose=True,
)

compliance_reviewer = Agent(
    role="KYC Compliance Reviewer",
    goal="Decide whether the case passes pension fund KYC policy or needs escalation",
    backstory=(
        "You apply strict KYC rules for pension fund onboarding with audit-ready reasoning."
    ),
    verbose=True,
)

3. Define tasks that produce structured outputs

Use Task objects to keep responsibilities narrow. The first task extracts facts; the second applies policy and returns a clear decision that downstream systems can consume.

from crewai import Task

extract_task = Task(
    description=(
        "Review the provided member data and documents. "
        "Extract verified identity details, note mismatches, expired documents, "
        "and any missing required items."
    ),
    expected_output="A concise factual summary of verified identity data and anomalies.",
    agent=document_analyst,
)

review_task = Task(
    description=(
        "Apply pension fund KYC policy to the extracted facts. "
        "If sanctions_hit or pep_hit is true, escalate. "
        "Otherwise decide approve or reject with reasons."
    ),
    expected_output="A structured KYC decision with status, risk level, reasons, and missing items.",
    agent=compliance_reviewer,
)

4. Run the crew and parse the result into your workflow

This is the actual execution pattern with Crew, Process.sequential, and kickoff. In production you would wrap this in an API endpoint or queue consumer.

from crewai import Crew, Process

def run_kyc_verification(kyc_input: KYCInput):
    crew = Crew(
        agents=[document_analyst, compliance_reviewer],
        tasks=[extract_task, review_task],
        process=Process.sequential,
        verbose=True,
    )

    prompt = f"""
Member data:
- Full name: {kyc_input.full_name}
- Date of birth: {kyc_input.date_of_birth}
- National ID: {kyc_input.national_id}
- Country of residence: {kyc_input.country_of_residence}
- Documents: {", ".join(kyc_input.documents)}
- PEP hit: {kyc_input.pep_hit}
- Sanctions hit: {kyc_input.sanctions_hit}

Return a clear KYC decision for a pension fund onboarding workflow.
"""

    result = crew.kickoff(inputs={"case": prompt})
    return result


if __name__ == "__main__":
    sample = KYCInput(
        full_name="Amina Ndlovu",
        date_of_birth="1982-04-11",
        national_id="ZA123456789",
        country_of_residence="South Africa",
        documents=["passport.pdf", "proof_of_address.pdf"],
        pep_hit=False,
        sanctions_hit=False,
    )

    output = run_kyc_verification(sample)
    print(output)

If you need stronger control over outputs, add a post-processing step that maps the model response into your KYCResult schema before writing to your case system. That keeps your downstream services from depending on free-form text.

Production Considerations

  • Data residency

    • Keep member PII in-region.
    • If your pension fund operates across jurisdictions, route cases to region-specific model endpoints or self-hosted inference.
    • Never send raw identity documents to tools or services outside approved residency boundaries.
  • Auditability

    • Persist:
      • input payload hash
      • extracted fields
      • final decision
      • timestamps
      • model version
    • Store reasoning summaries separately from raw PII so auditors can review decisions without broad access.
  • Guardrails

    • Hard-fail on sanctions hits and PEP matches.
    • Do not let the agent override deterministic policy checks.
    • Require human review for:
      • expired IDs close to threshold dates
      • address mismatches
      • incomplete proof-of-address sets
  • Monitoring

    • Track approval rate by country and document type.
    • Alert on spikes in escalations or false rejections.
    • Sample completed cases weekly to catch drift in extraction quality or policy interpretation.

Common Pitfalls

  1. Letting the agent make policy up as it goes

    • Avoid vague prompts like “decide if this looks valid.”
    • Encode hard rules outside the LLM where possible, then let CrewAI explain or summarize them.
  2. Sending raw PDFs straight into reasoning

    • Extract text first with OCR/document parsing.
    • The agent should reason over normalized fields and evidence snippets, not binary files.
  3. Skipping human escalation paths

    • Pension funds need defensible decisions for edge cases.
    • Build an explicit “review required” state for incomplete records, conflicting identifiers, name changes after marriage/divorce, or cross-border residency ambiguity.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides