How to Build a KYC verification Agent Using LlamaIndex in TypeScript for healthcare
A KYC verification agent for healthcare checks whether a patient, caregiver, provider, or vendor is who they claim to be, then decides whether the identity evidence is sufficient to proceed. In healthcare, that matters because bad identity verification leads to fraud, claim abuse, unauthorized access to records, and compliance problems under HIPAA and local data protection rules.
Architecture
- •
Document ingestion layer
- •Pulls passports, driver’s licenses, provider licenses, utility bills, insurance cards, and intake forms from approved storage.
- •Normalizes PDFs, images, and text into a consistent document format for LlamaIndex.
- •
Identity extraction pipeline
- •Uses LlamaIndex
VectorStoreIndexplus structured extraction prompts to pull fields like full name, DOB, address, document number, issuing authority, and expiration date. - •Keeps extracted data separate from raw documents for auditability.
- •Uses LlamaIndex
- •
Verification rules engine
- •Applies deterministic checks: field matching, expiry validation, address consistency, duplicate identity detection.
- •Flags cases that need manual review instead of forcing an automated pass/fail.
- •
Policy and compliance layer
- •Enforces healthcare-specific controls: minimum necessary access, retention rules, PII redaction, and region-specific storage.
- •Produces an audit trail for every decision.
- •
Human review queue
- •Sends low-confidence or high-risk cases to compliance staff.
- •Stores the agent’s reasoning plus source citations so reviewers can validate the decision fast.
Implementation
1) Install dependencies and define your document model
Use LlamaIndex TypeScript packages with a local or hosted LLM. For healthcare workflows, keep raw identity docs in secure object storage and only index what you need.
npm install llamaindex zod
import {
Document,
VectorStoreIndex,
Settings,
OpenAI,
} from "llamaindex";
Settings.llm = new OpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
type KycInput = {
caseId: string;
docType: "passport" | "driver_license" | "provider_license" | "insurance_card";
text: string;
};
function toDocument(input: KycInput) {
return new Document({
id_: input.caseId,
text: input.text,
metadata: {
docType: input.docType,
caseId: input.caseId,
sourceSystem: "healthcare-intake",
},
});
}
2) Build the index over verified identity documents
For a real system, index only approved intake artifacts. Don’t mix operational notes with regulated identity evidence unless your access controls are tight.
const docs = [
toDocument({
caseId: "case_1001",
docType: "passport",
text: "Passport holder: Jane Doe. DOB: 1988-04-12. Passport No: X1234567. Expiry: 2031-09-01.",
}),
toDocument({
caseId: "case_1002",
docType: "insurance_card",
text: "Member Name: Jane Doe. Policy ID: HLT998877. Group ID: ACME42.",
}),
];
const index = await VectorStoreIndex.fromDocuments(docs);
const queryEngine = index.asQueryEngine();
3) Ask for structured KYC verification output
The agent should not just answer “verified” or “not verified.” It should return a decision with reasons and evidence references so compliance teams can audit it later.
const prompt = `
You are a healthcare KYC verification agent.
Verify the identity evidence against standard onboarding checks:
- full name match
- date of birth present
- document expiration valid
- consistency across documents
- flag anything requiring manual review
Return JSON with:
{
"decision": "approve" | "reject" | "manual_review",
"confidence": number,
"checks": [{ "name": string, "status": "pass" | "fail" | "review", "evidence": string }],
"riskFlags": string[]
}
`;
const response = await queryEngine.query({ queryStr: prompt });
console.log(String(response));
4) Add deterministic policy checks before final approval
LLMs are good at extraction and summarization. They are not your final control plane. Use code for hard requirements like expiry dates and required-field presence.
function validateKycFields(fields: {
fullName?: string;
dob?: string;
expiryDate?: string;
}) {
const flags: string[] = [];
if (!fields.fullName) flags.push("missing_full_name");
if (!fields.dob) flags.push("missing_dob");
if (fields.expiryDate) {
const expiry = new Date(fields.expiryDate);
if (Number.isNaN(expiry.getTime()) || expiry < new Date()) {
flags.push("document_expired");
}
} else {
flags.push("missing_expiry_date");
}
return flags;
}
A practical flow is:
- •Ingest documents.
- •Index them with
VectorStoreIndex. - •Query for structured verification output.
- •Run deterministic policy checks.
- •Route approved cases automatically and send edge cases to human review.
Production Considerations
- •
Data residency
- •Keep PHI/PII in-region if your jurisdiction requires it.
- •If you use managed LLM endpoints, confirm where prompts and embeddings are processed and stored.
- •
Audit logging
- •Log every decision with case ID, timestamp, source document IDs, model version, prompt version, and final outcome.
- •Store citations from the query response so auditors can trace why the agent made a call.
- •
Guardrails
- •Redact unnecessary health data before indexing.
- •Limit retrieval scope to identity evidence; do not let the agent browse clinical notes or unrelated patient history.
- •Force manual review on low confidence or conflicting fields.
- •
Monitoring
- •Track false accepts, false rejects, manual review rate, latency, and document parse failures.
- •Alert on spikes in rejected documents from a single source system; that often means an upstream ingestion problem or fraud pattern.
Common Pitfalls
- •
Using the LLM as the final verifier
- •Avoid letting model output directly approve onboarding.
- •Use the model for extraction and reasoning; use code for mandatory checks like expiration dates and field matching.
- •
Mixing clinical data with identity data
- •Don’t index progress notes or lab results into the KYC workflow.
- •Keep the retrieval corpus limited to identity artifacts so you stay within minimum necessary access principles.
- •
Skipping provenance
- •If you cannot point to the source document behind each check, your audit trail is weak.
- •Persist document IDs, metadata, and extracted fields alongside every decision.
- •
Ignoring regional compliance constraints
- •A healthcare onboarding flow that works in one country may fail in another because of residency or retention rules.
- •Make storage location and retention policy part of the deployment configuration, not an afterthought.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit