How to Build a KYC verification Agent Using LlamaIndex in TypeScript for retail banking
A KYC verification agent checks customer identity documents, extracts key fields, compares them against policy and external data sources, and flags mismatches for human review. In retail banking, that matters because onboarding speed is tied directly to conversion, while weak verification creates compliance risk, fraud exposure, and audit pain.
Architecture
A production KYC agent for retail banking needs a narrow, auditable shape:
- •
Document ingestion layer
- •Accepts passports, national IDs, utility bills, and proof-of-income files.
- •Normalizes PDFs and images into text plus metadata.
- •
Policy retrieval layer
- •Stores bank-specific KYC rules, jurisdiction rules, and escalation thresholds.
- •Uses
VectorStoreIndex+asQueryEngine()to retrieve the right policy snippets.
- •
Extraction and classification layer
- •Pulls structured fields like name, DOB, address, document number, expiry date.
- •Classifies the case as pass / review / fail based on extracted evidence.
- •
Risk decision layer
- •Compares extracted data against application form data and internal watchlist signals.
- •Applies deterministic rules before any LLM-driven recommendation.
- •
Audit trail layer
- •Persists prompts, retrieved policy chunks, model outputs, timestamps, and reviewer overrides.
- •Required for model governance and regulatory review.
- •
Human-in-the-loop escalation
- •Routes low-confidence or high-risk cases to ops analysts.
- •Prevents the agent from making final decisions on ambiguous cases.
Implementation
1) Build a policy index for KYC rules
Start by loading your KYC policy documents into a VectorStoreIndex. In retail banking, keep policies split by jurisdiction so the retrieval step can return only applicable rules.
import { Document } from "llamaindex";
import { VectorStoreIndex } from "llamaindex";
const docs = [
new Document({
text: `
UK Retail Banking KYC Policy:
- Verify government-issued ID
- Proof of address must be within 90 days
- Escalate if document expiry is within 30 days
- Manual review required for mismatched names
`,
metadata: { jurisdiction: "UK", source: "kyc-policy" },
}),
new Document({
text: `
EU Retail Banking KYC Policy:
- Verify government-issued ID
- Proof of address must be within 90 days
- Sanctions screening required before approval
`,
metadata: { jurisdiction: "EU", source: "kyc-policy" },
}),
];
const index = await VectorStoreIndex.fromDocuments(docs);
const queryEngine = index.asQueryEngine();
2) Define a strict extraction prompt and call the LLM through LlamaIndex
For KYC you want structured output. Use OpenAI with query() against a prompt that forces JSON-like results. Then parse it in your service layer before any decisioning.
import { OpenAI } from "llamaindex";
const llm = new OpenAI({
model: "gpt-4o-mini",
});
const extractPrompt = `
Extract KYC fields from this customer document text.
Return only valid JSON with keys:
full_name, date_of_birth, document_type, document_number,
expiry_date, address, issuing_country, confidence
Document text:
{{text}}
`;
async function extractKycFields(text: string) {
const response = await llm.complete({
prompt: extractPrompt.replace("{{text}}", text),
});
return JSON.parse(response.text);
}
3) Retrieve applicable policy and compare against application data
This is where LlamaIndex earns its keep. Retrieve the relevant policy chunk using the customer’s jurisdiction, then apply deterministic checks before producing a recommendation.
type ApplicationData = {
fullName: string;
dob: string;
jurisdiction: "UK" | "EU";
};
async function verifyKyc(documentText: string, app: ApplicationData) {
const extracted = await extractKycFields(documentText);
const policyResponse = await queryEngine.query({
query: `${app.jurisdiction} retail banking KYC requirements for proof of identity and address`,
});
const policyText =
typeof policyResponse.response === "string"
? policyResponse.response
: String(policyResponse.response);
const nameMatches =
extracted.full_name?.toLowerCase() === app.fullName.toLowerCase();
const confidenceOk = Number(extracted.confidence) >= 0.8;
const needsReview =
!nameMatches || !confidenceOk || extracted.document_type !== "passport";
return {
extracted,
policyText,
decision: needsReview ? "manual_review" : "approve",
reasons: [
!nameMatches ? "Name mismatch" : null,
!confidenceOk ? "Low extraction confidence" : null,
extracted.document_type !== "passport" ? "Unsupported ID type" : null,
].filter(Boolean),
};
}
4) Add audit logging before returning any result
In retail banking you need to reconstruct every decision later. Log inputs, outputs, retrieved policy text, and model version to an immutable store.
async function auditKycRun(payload: {
applicationId: string;
decision: string;
reasons: string[];
}) {
console.log(JSON.stringify({
eventType: "kyc_verification_completed",
timestamp: new Date().toISOString(),
...payload,
model: "gpt-4o-mini",
system: "llamaindex-typescript",
}));
}
Production Considerations
- •
Keep final decisions deterministic
- •Use LLMs for extraction and summarization only.
- •Approval logic should be rule-based so compliance can explain outcomes.
- •
Enforce data residency
- •Store customer PII in-region.
- •If your bank operates in the UK or EU, make sure embeddings, logs, and vector stores stay inside approved jurisdictions.
- •
Instrument everything
- •Track extraction confidence, manual review rate, false positive rate on name mismatches, and average time to decision.
- •
Add guardrails around sensitive data
- •Redact passport numbers and addresses in logs.
- •Never send raw documents to downstream analytics without masking.
Common Pitfalls
- •
Letting the LLM make the final KYC decision
- •Don’t do this.
- •Use the model to extract facts; use deterministic rules to approve or escalate.
- •
Mixing policies across jurisdictions
- •A UK customer should not be evaluated against EU-only checks unless both apply.
- •Partition your
Documentcorpus by jurisdiction metadata and filter retrieval accordingly.
- •
Skipping auditability
- •If you can’t show what policy was retrieved and what evidence was used, you’ll fail internal review fast.
- •Persist prompts, responses, timestamps, reviewer actions, and versioned policies.
- •
Ignoring low-confidence extractions
- •OCR noise on scans is common in retail onboarding.
- •Anything below your threshold should go straight to manual review instead of being auto-approved.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit