How to Build a KYC verification Agent Using LangGraph in Python for healthcare
A KYC verification agent for healthcare checks whether a patient, provider, or payer identity is valid before the system allows access to regulated workflows. In practice, it reduces fraud, blocks duplicate or synthetic identities, and creates an auditable trail for compliance-heavy operations like onboarding, claims processing, prior auth, and telehealth access.
Architecture
- •
Input intake layer
- •Accepts structured identity payloads: name, DOB, government ID hash, address, NPI/insurance member ID, and consent flags.
- •Normalizes fields before any verification step runs.
- •
Policy engine
- •Decides which checks are required based on entity type.
- •Example: patient onboarding may require ID + address; provider onboarding may require license + NPI + sanctions screening.
- •
Verification tools
- •External API calls for identity validation, address verification, sanctions checks, and document checks.
- •Keep these as isolated tools so they can be audited and rate-limited.
- •
Stateful workflow graph
- •Orchestrates the verification sequence with conditional routing.
- •Tracks whether the request is pending, verified, rejected, or needs manual review.
- •
Audit and evidence store
- •Persists every decision input/output with timestamps and correlation IDs.
- •Required for HIPAA-aligned auditability and internal review.
- •
Human review queue
- •Handles exceptions: mismatched records, expired documents, or low-confidence matches.
- •Avoids hard failures when a manual adjudication path is required.
Implementation
1) Define the state model
Use a typed state object to keep the graph deterministic. For healthcare workflows, include fields for compliance decisions and audit metadata.
from typing import TypedDict, Optional, Literal
from langgraph.graph import StateGraph, START, END
class KYCState(TypedDict):
entity_type: Literal["patient", "provider", "payer"]
full_name: str
dob: str
id_number: str
address: str
consent_given: bool
normalized_name: Optional[str]
risk_score: Optional[int]
status: Optional[Literal["pending", "verified", "rejected", "manual_review"]]
reason: Optional[str]
2) Add tool functions for normalization and checks
Keep each function small. In production these would call internal services or vetted vendors with contracts that cover data processing and residency requirements.
def normalize_identity(state: KYCState) -> dict:
return {
"normalized_name": state["full_name"].strip().lower(),
"status": "pending",
}
def verify_consent(state: KYCState) -> dict:
if not state["consent_given"]:
return {"status": "rejected", "reason": "Consent missing"}
return {}
def score_risk(state: KYCState) -> dict:
# Replace with real scoring logic from your risk service
score = 20 if state["entity_type"] == "patient" else 45
return {"risk_score": score}
def decide_route(state: KYCState) -> str:
if state.get("status") == "rejected":
return END
if (state.get("risk_score") or 0) >= 40:
return "manual_review"
return "verified"
def mark_verified(state: KYCState) -> dict:
return {"status": "verified", "reason": None}
def mark_manual_review(state: KYCState) -> dict:
return {"status": "manual_review", "reason": "Risk threshold exceeded"}
3) Build the LangGraph workflow
This is the actual pattern you want in production: linear preprocessing first, then conditional branching based on computed state.
from langgraph.graph import StateGraph, START, END
graph = StateGraph(KYCState)
graph.add_node("normalize_identity", normalize_identity)
graph.add_node("verify_consent", verify_consent)
graph.add_node("score_risk", score_risk)
graph.add_node("mark_verified", mark_verified)
graph.add_node("mark_manual_review", mark_manual_review)
graph.add_edge(START, "normalize_identity")
graph.add_edge("normalize_identity", "verify_consent")
graph.add_edge("verify_consent", "score_risk")
graph.add_conditional_edges(
"score_risk",
decide_route,
{
"verified": "mark_verified",
"manual_review": "mark_manual_review",
END: END,
},
)
graph.add_edge("mark_verified", END)
graph.add_edge("mark_manual_review", END)
kyc_app = graph.compile()
result = kyc_app.invoke(
{
"entity_type": "patient",
"full_name": "Jane Doe",
"dob": "1987-04-11",
"id_number": "ID12345",
"address": "10 Main St",
"consent_given": True,
}
)
print(result)
4) Add audit logging around execution
For healthcare systems, don’t rely on graph state alone. Persist inputs and outputs with a correlation ID so compliance teams can reconstruct why a record was accepted or escalated.
import uuid
from datetime import datetime
def run_kyc_with_audit(payload: dict):
correlation_id = str(uuid.uuid4())
started_at = datetime.utcnow().isoformat()
result = kyc_app.invoke(payload)
audit_event = {
"correlation_id": correlation_id,
"started_at": started_at,
"finished_at": datetime.utcnow().isoformat(),
"input_entity_type": payload["entity_type"],
"final_status": result.get("status"),
"risk_score": result.get("risk_score"),
# Store this in your audit DB / SIEM / immutable log store
}
return result, audit_event
Production Considerations
- •
Data residency
- •Keep PHI-adjacent identity data in-region if your regulatory posture requires it.
- •If you call third-party verifiers, confirm where data is processed and retained.
- •
Monitoring
- •Track rejection rate by entity type, manual-review rate, vendor latency, and false positives.
- •Alert when a provider onboarding spike causes abnormal rejection patterns; that often signals upstream data quality issues.
- •
Guardrails
- •Redact sensitive fields before sending them to logs or observability tools.
- •Enforce consent checks before any external lookup. If consent is missing, stop early and write an audit event.
- •
Access control
- •Restrict who can replay workflows or inspect raw state.
- •Healthcare identity data should be handled with least privilege and strong service-to-service authentication.
Common Pitfalls
- •
Treating all entities the same
- •Patients, providers, and payers have different evidence requirements.
- •Fix it by branching policy early in the graph instead of using one generic verification path.
- •
Logging raw identifiers
- •Shipping full DOBs or government IDs into logs creates avoidable exposure.
- •Fix it by hashing identifiers at ingestion and redacting sensitive fields in every debug path.
- •
Skipping manual review paths
- •A hard binary approve/reject model breaks down fast in healthcare where records are messy.
- •Fix it by routing borderline scores to
manual_reviewand storing the exact reason code for adjudication.
- •
Ignoring compliance evidence
- •A green checkmark is not enough if you cannot explain why it happened.
- •Fix it by persisting node outputs, timestamps, vendor responses, and final decision metadata in an immutable audit store.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit