How to Build a KYC verification Agent Using LangGraph in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlanggraphpythonhealthcare

A KYC verification agent for healthcare checks whether a patient, provider, or payer identity is valid before the system allows access to regulated workflows. In practice, it reduces fraud, blocks duplicate or synthetic identities, and creates an auditable trail for compliance-heavy operations like onboarding, claims processing, prior auth, and telehealth access.

Architecture

  • Input intake layer

    • Accepts structured identity payloads: name, DOB, government ID hash, address, NPI/insurance member ID, and consent flags.
    • Normalizes fields before any verification step runs.
  • Policy engine

    • Decides which checks are required based on entity type.
    • Example: patient onboarding may require ID + address; provider onboarding may require license + NPI + sanctions screening.
  • Verification tools

    • External API calls for identity validation, address verification, sanctions checks, and document checks.
    • Keep these as isolated tools so they can be audited and rate-limited.
  • Stateful workflow graph

    • Orchestrates the verification sequence with conditional routing.
    • Tracks whether the request is pending, verified, rejected, or needs manual review.
  • Audit and evidence store

    • Persists every decision input/output with timestamps and correlation IDs.
    • Required for HIPAA-aligned auditability and internal review.
  • Human review queue

    • Handles exceptions: mismatched records, expired documents, or low-confidence matches.
    • Avoids hard failures when a manual adjudication path is required.

Implementation

1) Define the state model

Use a typed state object to keep the graph deterministic. For healthcare workflows, include fields for compliance decisions and audit metadata.

from typing import TypedDict, Optional, Literal
from langgraph.graph import StateGraph, START, END

class KYCState(TypedDict):
    entity_type: Literal["patient", "provider", "payer"]
    full_name: str
    dob: str
    id_number: str
    address: str
    consent_given: bool

    normalized_name: Optional[str]
    risk_score: Optional[int]
    status: Optional[Literal["pending", "verified", "rejected", "manual_review"]]
    reason: Optional[str]

2) Add tool functions for normalization and checks

Keep each function small. In production these would call internal services or vetted vendors with contracts that cover data processing and residency requirements.

def normalize_identity(state: KYCState) -> dict:
    return {
        "normalized_name": state["full_name"].strip().lower(),
        "status": "pending",
    }

def verify_consent(state: KYCState) -> dict:
    if not state["consent_given"]:
        return {"status": "rejected", "reason": "Consent missing"}
    return {}

def score_risk(state: KYCState) -> dict:
    # Replace with real scoring logic from your risk service
    score = 20 if state["entity_type"] == "patient" else 45
    return {"risk_score": score}

def decide_route(state: KYCState) -> str:
    if state.get("status") == "rejected":
        return END
    if (state.get("risk_score") or 0) >= 40:
        return "manual_review"
    return "verified"

def mark_verified(state: KYCState) -> dict:
    return {"status": "verified", "reason": None}

def mark_manual_review(state: KYCState) -> dict:
    return {"status": "manual_review", "reason": "Risk threshold exceeded"}

3) Build the LangGraph workflow

This is the actual pattern you want in production: linear preprocessing first, then conditional branching based on computed state.

from langgraph.graph import StateGraph, START, END

graph = StateGraph(KYCState)

graph.add_node("normalize_identity", normalize_identity)
graph.add_node("verify_consent", verify_consent)
graph.add_node("score_risk", score_risk)
graph.add_node("mark_verified", mark_verified)
graph.add_node("mark_manual_review", mark_manual_review)

graph.add_edge(START, "normalize_identity")
graph.add_edge("normalize_identity", "verify_consent")
graph.add_edge("verify_consent", "score_risk")

graph.add_conditional_edges(
    "score_risk",
    decide_route,
    {
        "verified": "mark_verified",
        "manual_review": "mark_manual_review",
        END: END,
    },
)

graph.add_edge("mark_verified", END)
graph.add_edge("mark_manual_review", END)

kyc_app = graph.compile()

result = kyc_app.invoke(
    {
        "entity_type": "patient",
        "full_name": "Jane Doe",
        "dob": "1987-04-11",
        "id_number": "ID12345",
        "address": "10 Main St",
        "consent_given": True,
    }
)

print(result)

4) Add audit logging around execution

For healthcare systems, don’t rely on graph state alone. Persist inputs and outputs with a correlation ID so compliance teams can reconstruct why a record was accepted or escalated.

import uuid
from datetime import datetime

def run_kyc_with_audit(payload: dict):
    correlation_id = str(uuid.uuid4())
    started_at = datetime.utcnow().isoformat()

    result = kyc_app.invoke(payload)

    audit_event = {
        "correlation_id": correlation_id,
        "started_at": started_at,
        "finished_at": datetime.utcnow().isoformat(),
        "input_entity_type": payload["entity_type"],
        "final_status": result.get("status"),
        "risk_score": result.get("risk_score"),
        # Store this in your audit DB / SIEM / immutable log store
    }
    return result, audit_event

Production Considerations

  • Data residency

    • Keep PHI-adjacent identity data in-region if your regulatory posture requires it.
    • If you call third-party verifiers, confirm where data is processed and retained.
  • Monitoring

    • Track rejection rate by entity type, manual-review rate, vendor latency, and false positives.
    • Alert when a provider onboarding spike causes abnormal rejection patterns; that often signals upstream data quality issues.
  • Guardrails

    • Redact sensitive fields before sending them to logs or observability tools.
    • Enforce consent checks before any external lookup. If consent is missing, stop early and write an audit event.
  • Access control

    • Restrict who can replay workflows or inspect raw state.
    • Healthcare identity data should be handled with least privilege and strong service-to-service authentication.

Common Pitfalls

  1. Treating all entities the same

    • Patients, providers, and payers have different evidence requirements.
    • Fix it by branching policy early in the graph instead of using one generic verification path.
  2. Logging raw identifiers

    • Shipping full DOBs or government IDs into logs creates avoidable exposure.
    • Fix it by hashing identifiers at ingestion and redacting sensitive fields in every debug path.
  3. Skipping manual review paths

    • A hard binary approve/reject model breaks down fast in healthcare where records are messy.
    • Fix it by routing borderline scores to manual_review and storing the exact reason code for adjudication.
  4. Ignoring compliance evidence

    • A green checkmark is not enough if you cannot explain why it happened.
    • Fix it by persisting node outputs, timestamps, vendor responses, and final decision metadata in an immutable audit store.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides