How to Build a KYC verification Agent Using LlamaIndex in TypeScript for banking
A KYC verification agent checks customer documents, extracts identity data, compares it against bank policy, and flags cases that need manual review. In banking, that matters because onboarding speed is tied directly to fraud risk, compliance obligations, and customer drop-off.
Architecture
A production KYC agent for banking needs a small but strict set of components:
- •
Document ingestion
- •Accepts PDFs, scans, and image-based IDs.
- •Normalizes files before extraction.
- •
LLM-backed extraction
- •Pulls structured fields like name, DOB, address, document number, and expiry date.
- •Uses a controlled schema so outputs are predictable.
- •
Policy evaluation layer
- •Compares extracted fields against bank rules.
- •Handles age checks, document validity, jurisdiction constraints, and missing-field logic.
- •
Case decision engine
- •Produces
approve,reject, ormanual_review. - •Keeps the final decision deterministic.
- •Produces
- •
Audit logging
- •Stores prompts, model outputs, rule hits, and timestamps.
- •Needed for compliance review and dispute handling.
- •
Human review handoff
- •Routes ambiguous cases to an analyst.
- •Prevents the agent from making unsupported decisions.
Implementation
1) Set up the LlamaIndex service context
For KYC workflows, keep extraction narrow. Don’t ask the model to “decide if the customer is trustworthy”; ask it to extract fields into a schema you can validate.
import { Document } from "llamaindex";
import {
Settings,
OpenAI,
} from "llamaindex";
Settings.llm = new OpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const kycPrompt = `
Extract KYC fields from the document text.
Return only structured data:
- fullName
- dateOfBirth
- documentType
- documentNumber
- issuingCountry
- expiryDate
- address
- nationality
- confidenceNotes
`;
2) Build an extraction function with Document and a query engine pattern
Use LlamaIndex to turn raw text into an indexable document. For a single KYC packet, a simple vector index is enough; the important part is keeping the output structured and auditable.
import {
VectorStoreIndex,
} from "llamaindex";
type KycResult = {
fullName?: string;
dateOfBirth?: string;
documentType?: string;
documentNumber?: string;
issuingCountry?: string;
expiryDate?: string;
address?: string;
nationality?: string;
};
export async function extractKycFields(rawText: string): Promise<KycResult> {
const doc = new Document({ text: rawText });
const index = await VectorStoreIndex.fromDocuments([doc]);
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({
query: `${kycPrompt}\n\nDocument text:\n${rawText}`,
});
const text = response.response ?? "";
return JSON.parse(text) as KycResult;
}
This works well when your upstream OCR pipeline already produced clean text. If you’re dealing with scanned IDs directly, put OCR before this step; LlamaIndex should not be your OCR layer.
3) Add deterministic policy checks
The model extracts data. Your code decides whether it passes policy. This is where banking controls belong.
type Decision = "approve" | "reject" | "manual_review";
function evaluateKyc(result: KycResult): { decision: Decision; reasons: string[] } {
const reasons: string[] = [];
if (!result.fullName) reasons.push("Missing full name");
if (!result.dateOfBirth) reasons.push("Missing date of birth");
if (!result.documentNumber) reasons.push("Missing document number");
if (!result.expiryDate) reasons.push("Missing expiry date");
if (result.issuingCountry && !["US", "GB", "CA", "AU"].includes(result.issuingCountry)) {
reasons.push(`Unsupported issuing country: ${result.issuingCountry}`);
return { decision: "manual_review", reasons };
}
if (reasons.length >= 2) {
return { decision: "manual_review", reasons };
}
if (reasons.length === 1) {
return { decision: "reject", reasons };
}
return { decision: "approve", reasons };
}
This keeps compliance logic outside the model. That matters because regulators will ask why a customer was rejected, and “the model said so” is not a valid answer.
4) Wire it together with audit output
You need a traceable record for each verification request. Store the source text hash, extracted fields, decision reason, and operator override path.
import crypto from "crypto";
export async function runKycVerification(rawText: string) {
const sourceHash = crypto.createHash("sha256").update(rawText).digest("hex");
const extracted = await extractKycFields(rawText);
const verdict = evaluateKyc(extracted);
const auditRecord = {
sourceHash,
extracted,
verdict,
timestamp: new Date().toISOString(),
};
console.log(JSON.stringify(auditRecord, null, 2));
return auditRecord;
}
In production, send auditRecord to immutable storage or your SIEM. Keep PII access controlled and region-bound if your banking policy requires data residency in a specific jurisdiction.
Production Considerations
- •
Keep the final decision deterministic
- •Use LLMs for extraction and summarization only.
- •Put approval/rejection logic in code with versioned rules.
- •
Log every step
- •Store prompt version, model version, OCR input hash, extracted output, and rule outcomes.
- •This is essential for audits and internal model risk reviews.
- •
Enforce data residency
- •Route documents through region-specific infrastructure.
- •Don’t ship customer identity documents across borders unless legal/compliance has signed off.
- •
Add human-in-the-loop thresholds
- •Any low-confidence extraction or conflicting field should go to manual review.
- •Banks should not auto-reject on uncertain OCR or ambiguous identity matches.
Common Pitfalls
- •
Using the LLM as the policy engine
- •Mistake: asking the model whether a customer “passes KYC.”
- •Fix: extract fields with LlamaIndex; enforce policy in TypeScript rules you can test and version.
- •
Skipping auditability
- •Mistake: storing only the final decision.
- •Fix: persist prompt version, source hash, extracted fields, rule hits, and reviewer overrides.
- •
Ignoring regional compliance constraints
- •Mistake: sending identity documents to whatever endpoint is available.
- •Fix: pin inference infrastructure to approved regions and redact unnecessary PII before logging or analytics.
- •
Letting low-confidence outputs auto-pass
- •Mistake: treating partial extraction as good enough.
- •Fix: route incomplete or conflicting results to manual review every time.
A solid KYC agent is boring in the right places. The model extracts data; your code enforces policy; your logs prove what happened. That’s the pattern banks can actually ship.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit