How to Build a KYC verification Agent Using LlamaIndex in TypeScript for investment banking
A KYC verification agent for investment banking ingests client documents, extracts identity and ownership data, checks it against policy and external sources, and produces a review-ready decision with evidence. It matters because onboarding delays, incomplete due diligence, and weak audit trails create direct compliance risk, slow revenue recognition, and frustrate front-office teams.
Architecture
- •
Document ingestion layer
- •Accepts passports, incorporation docs, proof of address, shareholder registers, and sanctions screening outputs.
- •Normalizes PDFs, scans, and text into indexed nodes.
- •
Policy retrieval layer
- •Stores KYC policy, jurisdiction rules, beneficial ownership thresholds, and escalation criteria.
- •Uses retrieval to ground the agent in firm-specific procedures.
- •
Extraction and verification layer
- •Pulls structured fields like legal name, registration number, UBO percentages, address history, and document dates.
- •Cross-checks extracted data against expected formats and completeness rules.
- •
Decisioning layer
- •Produces one of three outcomes: approve, reject, or escalate to human review.
- •Applies bank-specific thresholds for missing data, expired documents, or adverse findings.
- •
Audit trail layer
- •Persists prompts, retrieved context, model outputs, source document references, and final decisions.
- •Supports regulatory review and internal model governance.
- •
Integration layer
- •Pushes results into onboarding systems, case management tools, or CRM workflows.
- •Emits structured JSON for downstream controls.
Implementation
1) Install LlamaIndex for TypeScript and load your policy corpus
For investment banking KYC, start with your internal policy docs. The agent should answer from approved policy text first, not from model memory.
npm install llamaindex
import {
Document,
VectorStoreIndex,
Settings,
} from "llamaindex";
async function buildPolicyIndex() {
const docs = [
new Document({
text: `
KYC policy:
- Collect government ID for all natural persons.
- Collect certificate of incorporation for all entities.
- Verify beneficial owners at or above 25%.
- Escalate if any document is expired by more than 30 days.
- Escalate if jurisdiction is high-risk or sanctions hit is unresolved.
`,
metadata: { source: "internal_kyc_policy_v1" },
}),
];
return await VectorStoreIndex.fromDocuments(docs);
}
Settings.chunkSize = 512;
2) Create a retrieval-backed verifier that grounds every decision
Use asQueryEngine() to retrieve policy context before making a recommendation. In banking workflows, this is the difference between a controlled decision and a black box answer.
import { QueryEngineTool } from "llamaindex";
async function verifyKycCase(caseText: string) {
const index = await buildPolicyIndex();
const queryEngine = index.asQueryEngine();
const tool = new QueryEngineTool({
queryEngine,
metadata: {
name: "kyc_policy_lookup",
description: "Retrieves internal KYC policy rules for onboarding decisions",
},
});
const response = await tool.call({
input: `Review this case against policy:\n${caseText}`,
});
return response.toString();
}
const result = await verifyKycCase(`
Client: Acme Capital Ltd
Entity type: corporation
Documents:
- Certificate of incorporation present
- UBO register missing
- Passport expired on one director
Jurisdiction: Cayman Islands
`);
console.log(result);
3) Add structured extraction for fields compliance teams actually need
The agent should not only summarize. It should extract fields that can be stored in a case system and audited later.
import { OpenAI } from "llamaindex";
type KycExtraction = {
legalName: string;
entityType: string;
jurisdiction: string;
beneficialOwnersMissing: boolean;
expiredDocuments: string[];
};
async function extractKycFields(rawText: string): Promise<KycExtraction> {
const llm = new OpenAI({ model: "gpt-4o-mini" });
const prompt = `
Extract KYC fields from the text below as strict JSON with keys:
legalName, entityType, jurisdiction, beneficialOwnersMissing, expiredDocuments.
Text:
${rawText}
`;
const response = await llm.complete(prompt);
return JSON.parse(response.text) as KycExtraction;
}
4) Combine extraction with a deterministic decision rule
Do not let the model make the final call alone. Use it to extract facts; then apply bank-controlled logic for approve/escalate/reject.
function decideKyc(extracted: KycExtraction) {
if (extracted.beneficialOwnersMissing) {
return { decision: "ESCALATE", reason: "UBO information missing" };
}
if (extracted.expiredDocuments.length > 0) {
return {
decision: "ESCALATE",
reason: `Expired documents found: ${extracted.expiredDocuments.join(", ")}`,
};
}
return { decision: "APPROVE", reason: "No blocking issues detected" };
}
async function runCase(rawText: string) {
const extracted = await extractKycFields(rawText);
const decision = decideKyc(extracted);
return {
extracted,
decision,
auditTag: "kyc_agent_v1",
timestamp: new Date().toISOString(),
};
}
Production Considerations
- •
Keep data residency explicit
- •Route EU client data to EU-hosted inference and storage paths.
- •Do not send raw PII to non-approved regions or unmanaged SaaS endpoints.
- •
Log every retrieval step
- •Persist retrieved chunks from
asQueryEngine(), model responses, extracted fields, and final decisions. - •Regulators care about why the system escalated a case.
- •Persist retrieved chunks from
- •
Use human-in-the-loop escalation
- •Any sanctions ambiguity, ownership chain uncertainty, or high-risk jurisdiction should default to review.
- •Investment banking KYC is not a place for autonomous approval on edge cases.
- •
Separate policy from prompt
- •Keep onboarding rules in versioned documents.
- •When policy changes, re-index the corpus instead of editing prompts in code.
Common Pitfalls
- •
Letting the LLM decide approval directly
This creates inconsistent outcomes and weak auditability. Use the model for extraction and explanation; use deterministic business rules for final decisions.
- •
Skipping source attribution
If you cannot show which document supported each field or escalation reason, compliance will reject the workflow. Store metadata on every
Documentand preserve retrieval context. - •
Ignoring ownership complexity
Corporate structures in investment banking often span trusts, nominees, SPVs, and multiple jurisdictions. Build explicit logic for UBO thresholds and escalation when ownership cannot be resolved cleanly.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit