How to Build a KYC verification Agent Using LlamaIndex in TypeScript for lending

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationllamaindextypescriptlending

A KYC verification agent for lending takes borrower documents, extracts the right identity signals, checks them against policy, and returns a decision package that a human or downstream system can trust. For lenders, this matters because onboarding speed, fraud control, and regulatory compliance all sit on the same workflow.

Architecture

Build this agent as a narrow workflow, not a general chatbot.

  • Document ingestion layer

    • Accept PDFs, scans, bank statements, utility bills, passports, and application forms.
    • Normalize files into text before any LLM call.
  • KYC policy index

    • Store your internal KYC rules, acceptable document lists, jurisdiction-specific requirements, and escalation thresholds.
    • Use LlamaIndex retrieval so the agent grounds every decision in policy text.
  • Extraction and validation layer

    • Pull structured fields like full name, DOB, address, ID number, expiry date, and document type.
    • Compare extracted fields across documents and against application data.
  • Decision engine

    • Produce one of: pass, review, or reject.
    • Keep the decision deterministic enough for audit trails.
  • Audit logging

    • Persist prompts, retrieved policy chunks, extracted fields, model output, and final decision.
    • This is non-negotiable in lending.
  • Human review handoff

    • Route mismatches, low-confidence OCR results, or policy exceptions to an analyst queue.

Implementation

1) Install dependencies and set up the LlamaIndex client

Use the TypeScript packages from LlamaIndex plus a PDF reader. The pattern below assumes Node.js with ESM support.

npm install llamaindex dotenv

Create a .env file with your model key:

OPENAI_API_KEY=your_key_here

2) Load policy documents into a vector index

For KYC you usually have a policy pack: acceptable IDs by country, proof-of-address rules, sanctions escalation steps, and manual review triggers. Index those documents so the agent can retrieve exact policy text before making a decision.

import "dotenv/config";
import {
  Document,
  VectorStoreIndex,
  Settings,
  OpenAI,
} from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
});

const kycPolicyDocs = [
  new Document({
    text: `
KYC Policy v3:
- Government-issued photo ID required.
- Proof of address must be dated within 90 days.
- If name mismatch exceeds one token difference, route to manual review.
- If document expiry is within 30 days, route to manual review.
- For high-risk jurisdictions, require enhanced due diligence.
`,
    metadata: { source: "kyc-policy-v3" },
  }),
];

const index = await VectorStoreIndex.fromDocuments(kycPolicyDocs);
const retriever = index.asRetriever({ similarityTopK: 3 });

3) Build an extraction prompt that returns structured JSON

Do not ask the model for free-form prose. Ask for a strict schema so you can validate it downstream. LlamaIndex’s queryEngine gives you retrieval grounding; the prompt keeps output machine-readable.

import { QueryEngineTool } from "llamaindex";

const kycTool = QueryEngineTool.from({
  queryEngine: index.asQueryEngine(),
  metadata: {
    name: "kyc_policy_lookup",
    description: "Retrieves internal KYC policy text for lending decisions",
  },
});

const borrowerPacket = `
Application:
Full Name: Sarah A. Khan
DOB: 1991-04-12
Address: 18 King Street, London
Country: GB

Documents:
1) Passport OCR:
Name: Sarah Khan
DOB: 1991-04-12
Expiry Date: 2029-08-01

2) Utility Bill OCR:
Name: Sarah Khan
Address: 18 King St., London
Issue Date: 2026-01-10
`;

const prompt = `
You are a KYC verification engine for lending.

Return JSON only with keys:
{
  "decision": "pass" | "review" | "reject",
  "reason_codes": string[],
  "extracted_fields": {
    "name_match": boolean,
    "dob_match": boolean,
    "address_match": boolean,
    "id_expiry_within_30_days": boolean
  },
  "policy_citations": string[]
}

Use the retrieved policy text to justify every decision.
`;

4) Query the index and make the final decision

This is the actual runtime pattern: retrieve policy context first, then send borrower data plus instructions. Keep the result auditable.

import { QueryEngine } from "llamaindex";

const queryEngine = index.asQueryEngine();

const policyContext = await queryEngine.query({
  query: `
What are the lending KYC rules for identity match tolerance,
proof-of-address age limits, and expiry-based escalation?
`,
});

const agentInput = `
${prompt}

Policy Context:
${policyContext.response}

Borrower Packet:
${borrowerPacket}
`;

const result = await Settings.llm.complete(agentInput);

console.log(result.text);

If you want stricter orchestration around multiple tools later—OCR tool, sanctions screening tool, address validation tool—wrap them in an OpenAIAgent. For most lending workflows though, keep KYC as a deterministic pipeline with retrieval plus structured output.

Production Considerations

  • Auditability

    • Store every retrieved policy chunk and final JSON output alongside an immutable request ID.
    • Regulators will ask why a loan was delayed or rejected.
  • Data residency

    • Keep borrower PII in-region if your lending book spans multiple jurisdictions.
    • If your compliance team requires EU-only processing, do not ship raw identity docs to out-of-region services.
  • Monitoring

    • Track manual-review rate by country, document type failure rate, OCR confidence distribution, and false positive mismatch rate.
    • A spike in review rates often means your extraction logic drifted or a new document template appeared.
  • Guardrails

    • Hard-block decisions when required fields are missing instead of letting the model guess.
    • Never allow the model to override sanctions screening or AML escalation rules.

Common Pitfalls

  1. Letting the model decide without policy retrieval

    • Bad outcome: inconsistent approvals across similar cases.
    • Fix: always retrieve internal KYC policy first with VectorStoreIndex and ground the response in cited rules.
  2. Accepting free-form output

    • Bad outcome: parsing failures and silent compliance bugs.
    • Fix: require JSON-only output with explicit keys like decision, reason_codes, and field-level matches.
  3. Treating all mismatches as rejections

    • Bad outcome: unnecessary loan drop-off from harmless OCR noise like King Street vs King St..
    • Fix: define thresholds for manual review versus reject. In lending KYC, borderline cases should go to an analyst queue unless policy says otherwise.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides