How to Build a KYC verification Agent Using LlamaIndex in TypeScript for lending
A KYC verification agent for lending takes borrower documents, extracts the right identity signals, checks them against policy, and returns a decision package that a human or downstream system can trust. For lenders, this matters because onboarding speed, fraud control, and regulatory compliance all sit on the same workflow.
Architecture
Build this agent as a narrow workflow, not a general chatbot.
- •
Document ingestion layer
- •Accept PDFs, scans, bank statements, utility bills, passports, and application forms.
- •Normalize files into text before any LLM call.
- •
KYC policy index
- •Store your internal KYC rules, acceptable document lists, jurisdiction-specific requirements, and escalation thresholds.
- •Use LlamaIndex retrieval so the agent grounds every decision in policy text.
- •
Extraction and validation layer
- •Pull structured fields like full name, DOB, address, ID number, expiry date, and document type.
- •Compare extracted fields across documents and against application data.
- •
Decision engine
- •Produce one of:
pass,review, orreject. - •Keep the decision deterministic enough for audit trails.
- •Produce one of:
- •
Audit logging
- •Persist prompts, retrieved policy chunks, extracted fields, model output, and final decision.
- •This is non-negotiable in lending.
- •
Human review handoff
- •Route mismatches, low-confidence OCR results, or policy exceptions to an analyst queue.
Implementation
1) Install dependencies and set up the LlamaIndex client
Use the TypeScript packages from LlamaIndex plus a PDF reader. The pattern below assumes Node.js with ESM support.
npm install llamaindex dotenv
Create a .env file with your model key:
OPENAI_API_KEY=your_key_here
2) Load policy documents into a vector index
For KYC you usually have a policy pack: acceptable IDs by country, proof-of-address rules, sanctions escalation steps, and manual review triggers. Index those documents so the agent can retrieve exact policy text before making a decision.
import "dotenv/config";
import {
Document,
VectorStoreIndex,
Settings,
OpenAI,
} from "llamaindex";
Settings.llm = new OpenAI({
model: "gpt-4o-mini",
});
const kycPolicyDocs = [
new Document({
text: `
KYC Policy v3:
- Government-issued photo ID required.
- Proof of address must be dated within 90 days.
- If name mismatch exceeds one token difference, route to manual review.
- If document expiry is within 30 days, route to manual review.
- For high-risk jurisdictions, require enhanced due diligence.
`,
metadata: { source: "kyc-policy-v3" },
}),
];
const index = await VectorStoreIndex.fromDocuments(kycPolicyDocs);
const retriever = index.asRetriever({ similarityTopK: 3 });
3) Build an extraction prompt that returns structured JSON
Do not ask the model for free-form prose. Ask for a strict schema so you can validate it downstream. LlamaIndex’s queryEngine gives you retrieval grounding; the prompt keeps output machine-readable.
import { QueryEngineTool } from "llamaindex";
const kycTool = QueryEngineTool.from({
queryEngine: index.asQueryEngine(),
metadata: {
name: "kyc_policy_lookup",
description: "Retrieves internal KYC policy text for lending decisions",
},
});
const borrowerPacket = `
Application:
Full Name: Sarah A. Khan
DOB: 1991-04-12
Address: 18 King Street, London
Country: GB
Documents:
1) Passport OCR:
Name: Sarah Khan
DOB: 1991-04-12
Expiry Date: 2029-08-01
2) Utility Bill OCR:
Name: Sarah Khan
Address: 18 King St., London
Issue Date: 2026-01-10
`;
const prompt = `
You are a KYC verification engine for lending.
Return JSON only with keys:
{
"decision": "pass" | "review" | "reject",
"reason_codes": string[],
"extracted_fields": {
"name_match": boolean,
"dob_match": boolean,
"address_match": boolean,
"id_expiry_within_30_days": boolean
},
"policy_citations": string[]
}
Use the retrieved policy text to justify every decision.
`;
4) Query the index and make the final decision
This is the actual runtime pattern: retrieve policy context first, then send borrower data plus instructions. Keep the result auditable.
import { QueryEngine } from "llamaindex";
const queryEngine = index.asQueryEngine();
const policyContext = await queryEngine.query({
query: `
What are the lending KYC rules for identity match tolerance,
proof-of-address age limits, and expiry-based escalation?
`,
});
const agentInput = `
${prompt}
Policy Context:
${policyContext.response}
Borrower Packet:
${borrowerPacket}
`;
const result = await Settings.llm.complete(agentInput);
console.log(result.text);
If you want stricter orchestration around multiple tools later—OCR tool, sanctions screening tool, address validation tool—wrap them in an OpenAIAgent. For most lending workflows though, keep KYC as a deterministic pipeline with retrieval plus structured output.
Production Considerations
- •
Auditability
- •Store every retrieved policy chunk and final JSON output alongside an immutable request ID.
- •Regulators will ask why a loan was delayed or rejected.
- •
Data residency
- •Keep borrower PII in-region if your lending book spans multiple jurisdictions.
- •If your compliance team requires EU-only processing, do not ship raw identity docs to out-of-region services.
- •
Monitoring
- •Track manual-review rate by country, document type failure rate, OCR confidence distribution, and false positive mismatch rate.
- •A spike in review rates often means your extraction logic drifted or a new document template appeared.
- •
Guardrails
- •Hard-block decisions when required fields are missing instead of letting the model guess.
- •Never allow the model to override sanctions screening or AML escalation rules.
Common Pitfalls
- •
Letting the model decide without policy retrieval
- •Bad outcome: inconsistent approvals across similar cases.
- •Fix: always retrieve internal KYC policy first with
VectorStoreIndexand ground the response in cited rules.
- •
Accepting free-form output
- •Bad outcome: parsing failures and silent compliance bugs.
- •Fix: require JSON-only output with explicit keys like
decision,reason_codes, and field-level matches.
- •
Treating all mismatches as rejections
- •Bad outcome: unnecessary loan drop-off from harmless OCR noise like
King StreetvsKing St.. - •Fix: define thresholds for manual review versus reject. In lending KYC, borderline cases should go to an analyst queue unless policy says otherwise.
- •Bad outcome: unnecessary loan drop-off from harmless OCR noise like
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit