How to Build a KYC verification Agent Using CrewAI in Python for insurance

By Cyprian AaronsUpdated 2026-04-21

kyc-verificationcrewaipythoninsurance

A KYC verification agent for insurance automates the intake, validation, and risk triage of customer identity data before a policy is issued or a claim is paid. It matters because insurers need to reduce fraud, satisfy AML/KYC obligations, and keep an auditable record of every decision without turning onboarding into a manual review queue.

Architecture

A production KYC agent for insurance needs a small set of focused components:

•
Document intake layer
- •Accepts PDFs, images, and structured form data from onboarding or claims systems.
- •Normalizes input into text and metadata for downstream checks.
•
Identity extraction agent
- •Pulls out names, DOB, address, document numbers, expiry dates, and issuing country.
- •Produces structured JSON instead of free-form summaries.
•
Verification agent
- •Compares extracted fields against internal policyholder records and external verification results.
- •Flags mismatches, expired documents, missing fields, and suspicious patterns.
•
Compliance reasoning agent
- •Applies insurance-specific rules such as sanctions screening requirements, PEP flags, residency restrictions, and escalation thresholds.
- •Produces an auditable rationale for pass/fail/manual-review outcomes.
•
Audit and case logging
- •Stores inputs, outputs, timestamps, model versions, and tool calls.
- •Gives compliance teams a defensible trail for regulators and internal audit.
•
Human review handoff
- •Routes edge cases to operations when confidence is low or policy rules require manual approval.
- •Prevents the agent from making unsupported approval decisions.

Implementation

1) Install CrewAI and define your verification tools

CrewAI works best when the agent can call deterministic Python tools for things like record lookup or document checks. Keep the LLM focused on reasoning; keep validation logic in code.

from crewai import Agent, Task, Crew
from crewai_tools import tool
from pydantic import BaseModel
from typing import Dict

class KycResult(BaseModel):
    full_name: str
    dob: str
    address: str
    document_type: str
    document_number: str
    status: str
    reasons: list[str]

@tool("check_policyholder_record")
def check_policyholder_record(query: str) -> Dict:
    # Replace with CRM / policy admin / MDM lookup
    records = {
        "john doe": {
            "full_name": "John Doe",
            "dob": "1988-01-12",
            "address": "10 King St, London",
            "risk_flag": False,
        }
    }
    return records.get(query.lower(), {})

@tool("check_document_expiry")
def check_document_expiry(document_number: str) -> Dict:
    # Replace with OCR + doc verification provider response
    return {"valid": True, "expiry_status": "active"}

@tool("screen_sanctions")
def screen_sanctions(name: str) -> Dict:
    # Replace with sanctions/PEP vendor integration
    return {"hit": False, "match_score": 0.02}

2) Create a specialist KYC agent with strict output expectations

The important part here is constraining the agent’s job. Don’t ask it to “do compliance.” Ask it to extract evidence, compare it to source systems, and return a structured decision.

kyc_agent = Agent(
    role="KYC Verification Analyst",
    goal=(
        "Verify customer identity for insurance onboarding using provided tools "
        "and produce an auditable KYC decision."
    ),
    backstory=(
        "You are an insurance compliance analyst. You must be precise, "
        "risk-aware, and conservative when evidence is incomplete."
    ),
    tools=[check_policyholder_record, check_document_expiry, screen_sanctions],
    verbose=True,
)

3) Define the task and run the crew

Use a single task first. Once this works reliably, you can split extraction and compliance into separate agents.

kyc_task = Task(
    description=(
        "Review the applicant details below for insurance KYC verification.\n"
        "- Name: John Doe\n"
        "- DOB: 1988-01-12\n"
        "- Address: 10 King St, London\n"
        "- Document Number: ID1234567\n"
        "- Document Type: Passport\n\n"
        "Call the available tools to verify identity consistency, document status,\n"
        "and sanctions screening. Return a concise decision with reasons."
    ),
    expected_output=(
        "A JSON-like assessment with fields: full_name, dob, address,\n"
        "document_type, document_number, status (pass|manual_review|fail),\n"
        "and reasons as a list."
    ),
    agent=kyc_agent,
)

crew = Crew(
    agents=[kyc_agent],
    tasks=[kyc_task],
)

result = crew.kickoff()
print(result)

4) Wrap the output in your own policy layer

CrewAI gives you reasoning output; your application should make the final system decision. In insurance workflows that means enforcing deterministic policy rules after the model responds.

def apply_kyc_policy(agent_output: str) -> str:
    text = agent_output.lower()
    if "sanctions" in text and ("hit" in text or "match" in text):
        return "manual_review"
    if "missing" in text or "incomplete" in text:
        return "manual_review"
    if "fail" in text:
        return "fail"
    return "pass"

decision = apply_kyc_policy(str(result))
print({"decision": decision})

Production Considerations

•
Data residency
- •Keep PII inside approved regions.
- •If your insurer operates across jurisdictions, route EU customer data to EU-hosted infrastructure only.
•
Auditability
- •Log every tool call with request IDs, timestamps, model version, prompt version, and final outcome.
- •Regulators care about why a case was approved or escalated more than whether the answer looked “smart.”
•
Guardrails
- •Never let the LLM override sanctions hits or mandatory manual-review triggers.
- •Enforce hard rules in code before any downstream policy issuance event fires.
•
Monitoring
- •Track false positives on name matching, manual review rates, average turnaround time, and vendor failure rates.
- •Watch drift by country because document formats and naming conventions vary heavily across markets.

Common Pitfalls

•
Using the agent as the source of truth
- •The model should recommend; your policy engine should decide.
- •Fix this by applying deterministic business rules after crew.kickoff() returns.
•
Skipping structured outputs
- •Free-text answers are painful to audit and integrate.
- •Fix this by requiring JSON-like fields in expected_output and parsing them before persistence.
•
Ignoring jurisdiction-specific compliance rules
- •Insurance KYC is not one-size-fits-all across regions.
- •Fix this by encoding residency checks, retention policies, consent requirements, and escalation thresholds per market.

If you want this to survive production traffic in an insurer’s onboarding flow where compliance actually reviews logs later. keep the LLM narrow. Put identity checks in tools. Put approval logic in code. Keep every decision traceable.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit