How to Build a KYC verification Agent Using LlamaIndex in TypeScript for payments

By Cyprian AaronsUpdated 2026-04-21

kyc-verificationllamaindextypescriptpayments

A KYC verification agent for payments takes customer identity data, supporting documents, and policy rules, then decides whether the onboarding packet is complete, risky, or needs manual review. It matters because bad KYC creates fraud exposure, failed compliance audits, account freezes, and downstream payment disruption.

Architecture

A production KYC agent for payments needs these components:

•
Document ingestion layer
- •Pulls passports, utility bills, company registration docs, sanctions lists, and internal policy PDFs.
- •Normalizes text before indexing.
•
LlamaIndex retrieval layer
- •Stores policy docs and KYC playbooks in a VectorStoreIndex.
- •Uses RetrieverQueryEngine to answer verification questions against approved sources.
•
Verification orchestrator
- •Applies deterministic checks first: document presence, expiry dates, country mismatch, name mismatch.
- •Sends ambiguous cases to the LLM-backed reasoning layer.
•
Decision engine
- •Produces structured outputs like APPROVE, REJECT, or ESCALATE.
- •Attaches evidence snippets for auditability.
•
Audit and case logging
- •Persists every input, retrieved chunk, model response, and final decision.
- •Required for compliance review and dispute handling.
•
Policy and residency controls
- •Keeps regulated data in approved regions.
- •Prevents PII from leaving your boundary unless explicitly allowed.

Implementation

1) Install dependencies and set up the index

You want your KYC policy docs indexed separately from customer PII. In practice that means policies go into LlamaIndex; raw identity data stays in your application store and is only passed into the agent at decision time.

npm install llamaindex zod

import {
  Document,
  VectorStoreIndex,
  Settings,
} from "llamaindex";

Settings.chunkSize = 512;

const policyDocs = [
  new Document({
    text: `
      KYC Policy:
      - Government ID must be valid and unexpired.
      - Proof of address must be issued within the last 90 days.
      - Business accounts require beneficial ownership disclosure.
      - Sanctions hits must be escalated immediately.
    `,
    metadata: { source: "kyc-policy-v1" },
  }),
];

const index = await VectorStoreIndex.fromDocuments(policyDocs);

2) Build a query engine for policy-backed decisions

Use retrieval to ground every decision in your own compliance material. For payments teams, this is the difference between explainable decisions and random model output.

import { QueryEngineTool } from "llamaindex";

const queryEngine = index.asQueryEngine({
  similarityTopK: 3,
});

const kycPolicyTool = new QueryEngineTool({
  queryEngine,
  metadata: {
    name: "kyc_policy_lookup",
    description: "Answers questions about internal KYC policy and escalation rules",
  },
});

3) Add deterministic checks before LLM reasoning

Do not ask the model to infer obvious facts like expiry dates or missing fields. Use code for that. Then let LlamaIndex handle policy interpretation and exception handling.

type KycCase = {
  fullName: string;
  dob: string;
  country: string;
  idExpiry: string;
  proofOfAddressDate: string;
};

function runDeterministicChecks(input: KycCase) {
  const now = new Date();
  const expiry = new Date(input.idExpiry);
  const poaDate = new Date(input.proofOfAddressDate);

  const issues: string[] = [];

  if (expiry < now) issues.push("Government ID is expired.");
  
}

function runDeterministicChecks(input: KycCase) {
  const now = new Date();
  const expiry = new Date(input.idExpiry);
  const poaDate = new Date(input.proofOfAddressDate);

    if (expiry < now) issues.push("Government ID is expired.");
    if (now.getTime() - poaDate.getTime() > 90 * 24 * 60 * 60 * 1000) {
      issues.push("Proof of address is older than 90 days.");
    }

    return issues;
}

...and continue with agent-style reasoning

The actual agent pattern is to pass the case summary plus retrieved policy context into an LLM-powered query engine. In TypeScript with LlamaIndex, you can do this directly through the query engine response pipeline.

const caseSummary = `
Customer:
- Name: Ada Okafor
- DOB: 1992-04-11
- Country: NG
- ID expiry: ${"2027-05-01"}
- Proof of address date: ${"2026-02-01"}

Task:
Decide whether this case should be APPROVE, REJECT, or ESCALATE for payment onboarding.
Return a short reason and cite policy constraints.
`;

const response = await queryEngine.query({
   query: caseSummary,
});

console.log(String(response));

Why this pattern works

The important part is not “chatting with documents.” It is combining:

•hard validation in application code,
•retrieval over approved policy sources,
•structured case decisions for downstream systems.

For payments onboarding, that gives you traceability when compliance asks why a merchant was escalated or blocked.

Production Considerations

•
Keep PII out of broad logs

Log case IDs, not raw passport numbers or full addresses. If you need evidence for audit, encrypt it at rest and restrict access by role.
•
Separate regional data stores

If your payment program has EU or UK residency requirements, keep customer documents and vector indexes inside that region. Do not send regulated identity data to a cross-region hosted model without legal approval.
•
Use decision thresholds

Auto-approve only when deterministic checks pass and retrieval confidence is high. Anything ambiguous should become ESCALATE, not a guessed answer.
•
Track every retrieval

Store which chunks were retrieved, their source document IDs, and the final output. Auditors care about provenance more than model elegance.

Common Pitfalls

•
Letting the model decide basic validation
- •Mistake: asking the LLM whether an ID is expired or a document date is valid.
- •Fix: compute those checks in TypeScript first; reserve LlamaIndex for policy interpretation and explanation.
•
Mixing customer data with policy indexes
- •Mistake: embedding raw PII into the same vector store as compliance docs.
- •Fix: keep policy knowledge separate from live customer records. Pass only minimal case summaries into the agent.
•
Skipping audit trails
- •Mistake: storing only the final APPROVE or REJECT.
- •Fix: persist input snapshot hashes, retrieved sources, prompt text, model output, and operator overrides. That’s what makes the system defensible in a payments review.
•
Ignoring sanctions and escalation rules
- •Mistake: treating all KYC failures as simple rejects.
- •Fix: build explicit branches for sanctions hits, politically exposed persons, beneficial ownership gaps, and document fraud suspicion. Those usually require escalation rather than automatic rejection.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit