How to Build a KYC verification Agent Using CrewAI in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationcrewaipythonhealthcare

A KYC verification agent for healthcare checks whether a patient, provider, or payer identity record is complete, consistent, and compliant before the workflow moves forward. In practice, that means validating documents, matching identity data across systems, flagging missing fields, and producing an audit trail that can stand up to HIPAA, internal compliance reviews, and payer onboarding rules.

Architecture

  • Input intake layer

    • Receives structured identity payloads from a portal, FHIR gateway, CRM, or onboarding form.
    • Normalizes fields like name, DOB, address, license number, NPI, and government ID.
  • Document verification tool

    • Extracts and validates data from uploaded documents.
    • Checks expiry dates, document type, and field consistency.
  • Policy/compliance checker

    • Applies healthcare-specific rules.
    • Examples: minimum required fields present, state license valid for service region, consent captured, and no prohibited data exposure.
  • CrewAI orchestration layer

    • Coordinates specialized agents for extraction, validation, escalation, and final decisioning.
    • Keeps the workflow explicit and auditable.
  • Decision/audit store

    • Persists every step: inputs received, checks performed, output reason codes, and reviewer overrides.
    • Needed for compliance evidence and incident review.
  • Human review handoff

    • Routes edge cases to compliance staff when confidence is low or a policy conflict appears.
    • Prevents over-automation on regulated decisions.

Implementation

1. Install CrewAI and define the data model

Use a small typed payload so your agent doesn’t drift into free-form input handling. In healthcare workflows you want deterministic fields because downstream policy checks depend on them.

from pydantic import BaseModel
from typing import Optional


class KYCRequest(BaseModel):
    full_name: str
    date_of_birth: str
    address: str
    government_id: str
    provider_license: Optional[str] = None
    npi: Optional[str] = None
    consent_signed: bool = False
    state: Optional[str] = None

Install the dependencies:

pip install crewai crewai-tools pydantic

2. Create tools for document lookup and policy checks

CrewAI works best when agents use tools instead of hallucinating facts. For healthcare KYC, keep tools narrow: one for verification logic and one for compliance rules.

from crewai.tools import BaseTool


class PolicyCheckTool(BaseTool):
    name: str = "policy_check"
    description: str = "Validate healthcare KYC requirements against policy rules."

    def _run(self, full_name: str, date_of_birth: str, consent_signed: bool,
             provider_license: str | None = None,
             npi: str | None = None,
             state: str | None = None) -> str:
        issues = []

        if not consent_signed:
            issues.append("missing_consent")

        if not full_name or not date_of_birth:
            issues.append("missing_core_identity_fields")

        if provider_license is not None and state is None:
            issues.append("missing_state_for_license_validation")

        if npi is not None and len(npi) != 10:
            issues.append("invalid_npi_format")

        return "approved" if not issues else f"review_required:{','.join(issues)}"

3. Build specialized agents and tasks

Split responsibilities. One agent extracts/normalizes the record. Another applies policy. A third writes the final decision with reasons that can be audited later.

from crewai import Agent, Task, Crew, Process


compliance_agent = Agent(
    role="Healthcare Compliance Analyst",
    goal="Validate KYC records against healthcare onboarding rules.",
    backstory="You review identity records for providers and patients in regulated healthcare workflows.",
    tools=[PolicyCheckTool()],
    verbose=True,
)

decision_agent = Agent(
    role="KYC Decision Writer",
    goal="Produce a concise decision with reason codes for audit logging.",
    backstory="You convert compliance outcomes into a structured operational decision.",
    verbose=True,
)

verify_task = Task(
    description=(
        "Review this healthcare KYC request:\n"
        "{request}\n\n"
        "Use the policy_check tool to determine whether the record is approved "
        "or requires human review. Return the result with reason codes."
    ),
    expected_output="A decision string containing approved or review_required plus reason codes.",
    agent=compliance_agent,
)

write_task = Task(
    description=(
        "Take the compliance result and produce a final audit-friendly summary "
        "with decision, reasons, and recommended next action."
    ),
    expected_output="A short audit summary with decision status and next step.",
    agent=decision_agent,
)

4. Run the crew and persist the outcome

For production you want an execution path that returns both the raw result and an audit record. Keep PHI out of logs unless your environment is explicitly approved for it.

import json


def run_kyc(request_data: dict):
    request = KYCRequest(**request_data)

    crew = Crew(
        agents=[compliance_agent, decision_agent],
        tasks=[verify_task, write_task],
        process=Process.sequential,
        verbose=True,
    )

    result = crew.kickoff(inputs={"request": request.model_dump()})

    audit_record = {
        "request_id": request.government_id[-4:],  # replace with real ID in your system
        "decision_output": str(result),
        "consent_signed": request.consent_signed,
        "state": request.state,
        "npi_present": request.npi is not None,
    }

    print(json.dumps(audit_record, indent=2))
    return result


if __name__ == "__main__":
    sample_request = {
        "full_name": "Dr. Maya Patel",
        "date_of_birth": "1984-02-14",
        "address": "12 Market Street, Boston, MA",
        "government_id": "A123456789",
        "provider_license": "MA-PL-998877",
        "npi": "1234567890",
        "consent_signed": True,
        "state": "MA",
    }

    run_kyc(sample_request)

Production Considerations

  • Data residency

    • Keep patient/provider identity data in-region if your policy requires it.
    • If you use hosted LLM endpoints or vector stores, confirm where prompts and traces are stored.
  • Auditability

    • Log every decision with reason codes.
    • Store tool outputs separately from model text so auditors can reconstruct why a record was approved or escalated.
  • Guardrails

    • Block free-text decisions when mandatory fields are missing.
    • Add deterministic validation before any LLM call so obvious failures never reach the model.
  • Monitoring

    • Track approval rate by source system, false positive review rate, manual override rate, and tool failure rate.
    • Alert on spikes in “review_required” outcomes; that usually means upstream data quality regressed.

Common Pitfalls

  1. Using one general-purpose agent for everything

    • This creates vague outputs and weak audit trails.
    • Split extraction, policy evaluation, and final decisioning into separate agents or at least separate tasks.
  2. Letting the LLM decide on missing compliance data

    • If consent is absent or a license field is malformed, don’t ask the model to infer intent.
    • Fail closed with deterministic validation first.
  3. Logging PHI into debug output

    • CrewAI’s verbose=True is useful during development but dangerous in regulated environments if logs are not controlled.
    • Redact identifiers before logging and route traces to approved storage only.

A healthcare KYC agent built this way stays narrow in scope: validate identity inputs, apply explicit policy rules using CrewAI tools and tasks via Agent, Task, Crew, Process, then hand off anything ambiguous to humans. That’s the pattern that survives real compliance reviews instead of just passing a demo.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides