How to Build a KYC verification Agent Using LlamaIndex in TypeScript for lending

By Cyprian AaronsUpdated 2026-04-21

kyc-verificationllamaindextypescriptlending

A KYC verification agent for lending takes borrower documents, extracts the right identity signals, checks them against policy, and returns a decision package that a human or downstream system can trust. For lenders, this matters because onboarding speed, fraud control, and regulatory compliance all sit on the same workflow.

Architecture

Build this agent as a narrow workflow, not a general chatbot.

•
Document ingestion layer
- •Accept PDFs, scans, bank statements, utility bills, passports, and application forms.
- •Normalize files into text before any LLM call.
•
KYC policy index
- •Store your internal KYC rules, acceptable document lists, jurisdiction-specific requirements, and escalation thresholds.
- •Use LlamaIndex retrieval so the agent grounds every decision in policy text.
•
Extraction and validation layer
- •Pull structured fields like full name, DOB, address, ID number, expiry date, and document type.
- •Compare extracted fields across documents and against application data.
•
Decision engine
- •Produce one of: pass, review, or reject.
- •Keep the decision deterministic enough for audit trails.
•
Audit logging
- •Persist prompts, retrieved policy chunks, extracted fields, model output, and final decision.
- •This is non-negotiable in lending.
•
Human review handoff
- •Route mismatches, low-confidence OCR results, or policy exceptions to an analyst queue.

Implementation

1) Install dependencies and set up the LlamaIndex client

Use the TypeScript packages from LlamaIndex plus a PDF reader. The pattern below assumes Node.js with ESM support.

npm install llamaindex dotenv

Create a .env file with your model key:

OPENAI_API_KEY=your_key_here

2) Load policy documents into a vector index

For KYC you usually have a policy pack: acceptable IDs by country, proof-of-address rules, sanctions escalation steps, and manual review triggers. Index those documents so the agent can retrieve exact policy text before making a decision.

import "dotenv/config";
import {
  Document,
  VectorStoreIndex,
  Settings,
  OpenAI,
} from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
});

const kycPolicyDocs = [
  new Document({
    text: `
KYC Policy v3:
- Government-issued photo ID required.
- Proof of address must be dated within 90 days.
- If name mismatch exceeds one token difference, route to manual review.
- If document expiry is within 30 days, route to manual review.
- For high-risk jurisdictions, require enhanced due diligence.
`,
    metadata: { source: "kyc-policy-v3" },
  }),
];

const index = await VectorStoreIndex.fromDocuments(kycPolicyDocs);
const retriever = index.asRetriever({ similarityTopK: 3 });

3) Build an extraction prompt that returns structured JSON

Do not ask the model for free-form prose. Ask for a strict schema so you can validate it downstream. LlamaIndex’s queryEngine gives you retrieval grounding; the prompt keeps output machine-readable.

import { QueryEngineTool } from "llamaindex";

const kycTool = QueryEngineTool.from({
  queryEngine: index.asQueryEngine(),
  metadata: {
    name: "kyc_policy_lookup",
    description: "Retrieves internal KYC policy text for lending decisions",
  },
});

const borrowerPacket = `
Application:
Full Name: Sarah A. Khan
DOB: 1991-04-12
Address: 18 King Street, London
Country: GB

Documents:
1) Passport OCR:
Name: Sarah Khan
DOB: 1991-04-12
Expiry Date: 2029-08-01

2) Utility Bill OCR:
Name: Sarah Khan
Address: 18 King St., London
Issue Date: 2026-01-10
`;

const prompt = `
You are a KYC verification engine for lending.

Return JSON only with keys:
{
  "decision": "pass" | "review" | "reject",
  "reason_codes": string[],
  "extracted_fields": {
    "name_match": boolean,
    "dob_match": boolean,
    "address_match": boolean,
    "id_expiry_within_30_days": boolean
  },
  "policy_citations": string[]
}

Use the retrieved policy text to justify every decision.
`;

4) Query the index and make the final decision

This is the actual runtime pattern: retrieve policy context first, then send borrower data plus instructions. Keep the result auditable.

import { QueryEngine } from "llamaindex";

const queryEngine = index.asQueryEngine();

const policyContext = await queryEngine.query({
  query: `
What are the lending KYC rules for identity match tolerance,
proof-of-address age limits, and expiry-based escalation?
`,
});

const agentInput = `
${prompt}

Policy Context:
${policyContext.response}

Borrower Packet:
${borrowerPacket}
`;

const result = await Settings.llm.complete(agentInput);

console.log(result.text);

If you want stricter orchestration around multiple tools later—OCR tool, sanctions screening tool, address validation tool—wrap them in an OpenAIAgent. For most lending workflows though, keep KYC as a deterministic pipeline with retrieval plus structured output.

Production Considerations

•
Auditability
- •Store every retrieved policy chunk and final JSON output alongside an immutable request ID.
- •Regulators will ask why a loan was delayed or rejected.
•
Data residency
- •Keep borrower PII in-region if your lending book spans multiple jurisdictions.
- •If your compliance team requires EU-only processing, do not ship raw identity docs to out-of-region services.
•
Monitoring
- •Track manual-review rate by country, document type failure rate, OCR confidence distribution, and false positive mismatch rate.
- •A spike in review rates often means your extraction logic drifted or a new document template appeared.
•
Guardrails
- •Hard-block decisions when required fields are missing instead of letting the model guess.
- •Never allow the model to override sanctions screening or AML escalation rules.

Common Pitfalls

•
Letting the model decide without policy retrieval
- •Bad outcome: inconsistent approvals across similar cases.
- •Fix: always retrieve internal KYC policy first with VectorStoreIndex and ground the response in cited rules.
•
Accepting free-form output
- •Bad outcome: parsing failures and silent compliance bugs.
- •Fix: require JSON-only output with explicit keys like decision, reason_codes, and field-level matches.
•
Treating all mismatches as rejections
- •Bad outcome: unnecessary loan drop-off from harmless OCR noise like King Street vs King St..
- •Fix: define thresholds for manual review versus reject. In lending KYC, borderline cases should go to an analyst queue unless policy says otherwise.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit