How to Build a KYC verification Agent Using CrewAI in Python for investment banking

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationcrewaipythoninvestment-banking

A KYC verification agent in investment banking automates the first-pass review of client identity, entity structure, sanctions exposure, and document completeness. It matters because onboarding delays cost deal flow, while weak KYC creates regulatory risk, audit findings, and reputational damage.

Architecture

  • Intake layer

    • Accepts client-submitted documents, entity data, and onboarding forms.
    • Normalizes inputs into a structured case payload.
  • Document extraction layer

    • Pulls names, addresses, registration numbers, beneficial owners, and dates from PDFs or OCR text.
    • Flags missing or inconsistent fields.
  • Risk screening layer

    • Checks against internal policy rules, PEP/sanctions/watchlist results, and jurisdiction-specific requirements.
    • Produces a risk score and explicit reasons.
  • Compliance reasoning layer

    • Uses an LLM to summarize findings against KYC policy.
    • Generates an analyst-ready recommendation: approve, reject, or escalate.
  • Audit and evidence layer

    • Stores every input, output, and decision artifact.
    • Preserves traceability for model review and regulator requests.

Implementation

1) Install CrewAI and define the case schema

Keep the case object explicit. In banking systems, loose dictionaries become audit problems fast.

from pydantic import BaseModel
from typing import List, Optional

class KYCCase(BaseModel):
    client_name: str
    entity_type: str
    jurisdiction: str
    registration_number: Optional[str] = None
    beneficial_owners: List[str] = []
    documents: List[str] = []
    sanctions_hit: bool = False
    pep_hit: bool = False
    notes: Optional[str] = None

2) Create specialized agents with narrow responsibilities

Use separate agents for extraction and compliance review. That keeps prompts smaller and makes reviews easier when compliance asks why a decision was made.

from crewai import Agent

document_analyst = Agent(
    role="KYC Document Analyst",
    goal="Extract identity and ownership details from onboarding documents with high precision.",
    backstory=(
        "You work in investment banking onboarding. "
        "You are strict about missing fields, mismatched names, and stale documentation."
    ),
    verbose=True,
)

compliance_officer = Agent(
    role="KYC Compliance Officer",
    goal="Assess KYC completeness and determine whether the case can be approved or escalated.",
    backstory=(
        "You apply bank policy, AML expectations, sanctions screening results, "
        "and escalation thresholds used in investment banking."
    ),
    verbose=True,
)

3) Define tasks that produce auditable outputs

The key pattern is to force structured outputs in the task description. CrewAI will still use the LLM under the hood, but your downstream system should receive deterministic text blocks that can be parsed and logged.

from crewai import Task

extract_task = Task(
    description=(
        "Review the client onboarding data and extract: legal name consistency, "
        "entity type, registration number presence, beneficial owner names, "
        "and any missing or conflicting fields. Return a concise JSON-style summary."
    ),
    expected_output=(
        "A structured summary with fields: legal_name_match, entity_type_ok, "
        "registration_present, owners_found_count, missing_fields, conflicts."
    ),
    agent=document_analyst,
)

review_task = Task(
    description=(
        "Using the extracted summary plus sanctions_hit={sanctions_hit}, pep_hit={pep_hit}, "
        "jurisdiction={jurisdiction}, determine whether the case is APPROVE, ESCALATE, or REJECT. "
        "Include policy reasons suitable for an audit trail."
    ),
    expected_output=(
        "A decision with rationale, risk factors, and next action for compliance operations."
    ),
    agent=compliance_officer,
)

4) Run the crew on a real case payload

For production use you would wire this into your case management system after OCR and screening have already run. The example below shows the actual CrewAI pattern using Crew, Process, and kickoff().

from crewai import Crew, Process

case = KYCCase(
    client_name="Northbridge Capital Markets Ltd",
    entity_type="Private Limited Company",
    jurisdiction="UK",
    registration_number="12345678",
    beneficial_owners=["Alice Morgan", "David Chen"],
    documents=["certificate_of_incorporation.pdf", "ubo_declaration.pdf"],
    sanctions_hit=False,
    pep_hit=False,
)

crew = Crew(
    agents=[document_analyst, compliance_officer],
    tasks=[extract_task, review_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff(inputs=case.model_dump())
print(result)

In a bank workflow, you would usually wrap this in a service that:

  • stores the raw input payload,
  • persists each task output,
  • attaches reviewer metadata,
  • writes an immutable audit record.

Production Considerations

  • Enforce data residency

    • Route UK/EU client cases to approved regions only.
    • Do not send customer PII to unmanaged endpoints or consumer LLM tooling.
  • Log every decision path

    • Persist task prompts, model outputs, timestamps, screening inputs, and final disposition.
    • Compliance teams need replayable evidence during audits and model validation.
  • Add hard guardrails before approval

    • If sanctions or PEP hits are present, force escalation regardless of model output.
    • Never let the agent override deterministic policy rules.
  • Separate inference from control

    • Let the agent recommend; let workflow code decide.
    • In investment banking onboarding, human sign-off should remain mandatory for high-risk jurisdictions or complex ownership structures.

Common Pitfalls

  • Using one generic agent for everything

    • That creates vague outputs and weak accountability.
    • Split extraction from compliance reasoning so each step is testable.
  • Letting free-form text drive decisions

    • If your downstream system reads unstructured prose directly, you will get brittle automation.
    • Force consistent output formats and parse them before writing to case systems.
  • Ignoring screening precedence

    • A clean narrative does not cancel a sanctions hit.
    • Apply deterministic rules first; use CrewAI for analysis and summarization after mandatory checks run.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides