How to Build a KYC verification Agent Using LangChain in TypeScript for healthcare
A healthcare KYC verification agent checks a patient’s or provider’s identity, validates submitted documents, and decides whether the record is ready for onboarding, claims access, or telehealth enrollment. It matters because bad identity data creates compliance risk, slows intake, and can expose protected health information to the wrong workflow.
Architecture
- •
Input layer
- •Accepts structured applicant data: name, DOB, address, government ID metadata, insurance details.
- •Normalizes raw form fields before they hit the model.
- •
Document extraction layer
- •Pulls text from uploaded PDFs, images, or scanned forms.
- •In production, this is usually OCR plus deterministic parsing before any LLM step.
- •
Verification chain
- •Uses LangChain to classify completeness, detect mismatches, and produce a decision.
- •Keeps the model on a strict schema so downstream systems can trust the output.
- •
Policy and compliance guardrail
- •Enforces healthcare-specific rules: minimum necessary access, consent checks, residency constraints, audit logging.
- •Blocks unsupported decisions when evidence is incomplete.
- •
Audit trail store
- •Records inputs, model version, prompts, outputs, and final decision.
- •Required for internal review and regulatory traceability.
- •
Human review queue
- •Routes borderline cases to operations staff.
- •Prevents automatic rejection when evidence quality is poor or data conflicts are ambiguous.
Implementation
1) Define the verification schema
Use Zod so the agent returns structured output. For healthcare workflows, you want explicit fields for risk and review status instead of free-form text.
import { z } from "zod";
export const KycVerificationSchema = z.object({
fullNameMatch: z.boolean(),
dateOfBirthMatch: z.boolean(),
addressMatch: z.boolean(),
documentType: z.enum(["passport", "driver_license", "national_id", "insurance_card", "other"]),
riskScore: z.number().min(0).max(100),
decision: z.enum(["approved", "needs_review", "rejected"]),
reasons: z.array(z.string()).min(1),
});
export type KycVerificationResult = z.infer<typeof KycVerificationSchema>;
2) Build a LangChain ChatPromptTemplate with strict instructions
Keep the prompt narrow. The agent should verify identity signals only; it should not infer clinical conditions or make eligibility decisions outside KYC scope.
import { ChatPromptTemplate } from "@langchain/core/prompts";
export const prompt = ChatPromptTemplate.fromMessages([
[
"system",
[
"You are a healthcare KYC verification agent.",
"Validate identity consistency using only the provided applicant data and extracted document text.",
"Do not infer medical information.",
"If evidence is incomplete or conflicting, set decision to needs_review.",
"Return only structured output that matches the schema.",
].join(" "),
],
[
"human",
`Applicant record:
{applicantJson}
Extracted document text:
{documentText}
Return a verification result.`,
],
]);
3) Wire the model with structured output using withStructuredOutput
This is the part that makes the workflow production-friendly. LangChain’s withStructuredOutput gives you typed results instead of parsing fragile prose.
import { ChatOpenAI } from "@langchain/openai";
import { RunnableLambda } from "@langchain/core/runnables";
import { prompt } from "./prompt";
import { KycVerificationSchema } from "./schema";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const verifier = model.withStructuredOutput(KycVerificationSchema);
export const kycChain = prompt.pipe(verifier);
export async function verifyApplicant(applicantJson: string, documentText: string) {
const result = await kycChain.invoke({
applicantJson,
documentText,
});
return result;
}
4) Add deterministic post-processing and human-review routing
For healthcare use cases, don’t let the LLM be the final policy engine. Use simple rules after the model response to enforce thresholds and escalation paths.
import type { KycVerificationResult } from "./schema";
import { verifyApplicant } from "./verify";
function routeDecision(result: KycVerificationResult) {
if (result.decision === "approved" && result.riskScore <= 20) {
return "auto_approve";
}
if (result.decision === "needs_review" || result.riskScore > 20) {
return "send_to_human_review";
}
return "reject";
}
async function run() {
const applicantJson = JSON.stringify({
fullName: "Amina Patel",
dateOfBirth: "1991-04-12",
address: "12 Cedar Lane, Austin, TX",
documentType: "driver_license",
consentGiven: true,
region: "us-east-1",
});
const documentText = `
Texas Driver License
Name: Amina Patel
DOB: 04/12/1991
Address: 12 Cedar Lane Austin TX
ID Number: D12345678
`;
const result = await verifyApplicant(applicantJson, documentText);
console.log({
result,
route: routeDecision(result),
});
}
run().catch(console.error);
Production Considerations
- •
Deploy in-region
- •Keep inference and storage in approved regions if your healthcare contracts require data residency.
- •If PHI or identity documents are processed, ensure your cloud setup matches your compliance boundary.
- •
Log for audit without leaking sensitive data
- •Store prompt version, model name, decision fields, timestamps, and hash references to documents.
- •Avoid dumping raw identity documents into application logs.
- •
Add guardrails around PHI
- •Redact unnecessary identifiers before sending content to the model.
- •Use a minimum-necessary policy so the agent only sees what it needs for identity verification.
- •
Monitor drift and false approvals
- •Track approval rate by document type and region.
- •Alert when “needs_review” spikes or when manual reviewers frequently overturn automated decisions.
Common Pitfalls
- •
Using free-form text output
- •This breaks downstream automation fast.
- •Fix it by using
withStructuredOutputand a Zod schema so every response has predictable fields.
- •
Letting the model make policy decisions
- •The LLM should assess evidence; your application should enforce policy.
- •Fix it by separating verification scoring from final routing logic in code.
- •
Sending too much sensitive data to the model
- •Healthcare teams often over-share full records because it’s convenient.
- •Fix it by pre-redacting non-essential fields and limiting inputs to identity-relevant attributes only.
- •
Skipping auditability
- •If you can’t explain why an applicant was approved or routed to review, you will have problems during compliance reviews.
- •Fix it by storing prompt versioning, structured outputs, reviewer actions, and immutable event logs tied to each case.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit