How to Build a KYC verification Agent Using LlamaIndex in TypeScript for wealth management

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationllamaindextypescriptwealth-management

A KYC verification agent for wealth management takes client-submitted identity data, supporting documents, and internal policy rules, then turns that into a structured verification decision with an audit trail. It matters because wealth firms need faster onboarding without weakening compliance: you want to catch missing documents, sanctions risk, beneficial ownership issues, and residency constraints before a human reviewer ever sees the case.

Architecture

Build this agent as a small pipeline, not a single prompt.

  • Document ingestion layer

    • Pull PDFs, scans, passports, proof of address, tax forms, and source-of-funds documents.
    • Normalize them into text chunks with metadata like clientId, jurisdiction, docType, and uploadTimestamp.
  • Policy knowledge base

    • Store your KYC policy manuals, jurisdiction-specific onboarding rules, and escalation criteria in a vector index.
    • Use retrieval to ground every decision in the firm’s actual controls.
  • Verification orchestrator

    • Coordinates extraction, policy lookup, risk scoring, and final decision formatting.
    • In practice this is where LlamaIndex query engines do the heavy lifting.
  • Audit log writer

    • Persist every input document reference, retrieved policy snippet, model output, and final recommendation.
    • Wealth management reviewers will ask “why was this approved?” You need deterministic evidence.
  • Human review gate

    • Route low-confidence or high-risk cases to compliance ops.
    • Never let the agent auto-approve politically exposed persons, sanctioned jurisdictions, or incomplete beneficial ownership cases.

Implementation

1) Install the TypeScript packages

Use the TypeScript SDK and an embedding model you can run in your environment.

npm install llamaindex zod

Set your LLM and embedding provider in environment variables. For production wealth workflows, keep these aligned with your data residency requirements.

2) Index your KYC policy documents

This example builds a vector index from internal KYC policy text. The same pattern works for PDFs after you extract text upstream.

import {
  Document,
  VectorStoreIndex,
  Settings,
  OpenAI,
  OpenAIEmbedding,
} from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
});

Settings.embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
});

async function buildPolicyIndex() {
  const docs = [
    new Document({
      text: `
        KYC Policy:
        - Verify government-issued ID for all clients.
        - Require proof of address dated within 90 days.
        - Escalate any PEP match to compliance review.
        - Reject onboarding if beneficial owner information is incomplete.
        - Sanctions screening must be completed before account opening.
      `,
      metadata: { source: "internal-kyc-policy", jurisdiction: "global" },
    }),
    new Document({
      text: `
        EU Wealth Management Addendum:
        - Retain onboarding evidence for at least 5 years.
        - Store client data in approved EU regions only.
        - Enhanced due diligence required for cross-border trusts.
      `,
      metadata: { source: "eu-addendum", jurisdiction: "EU" },
    }),
  ];

  return await VectorStoreIndex.fromDocuments(docs);
}

3) Query the policy index with a case file

The agent should answer from retrieved policy context, not from free-form memory. That keeps outputs auditable.

import { QueryEngineTool } from "llamaindex";

async function verifyClientCase() {
  const index = await buildPolicyIndex();
  const queryEngine = index.asQueryEngine();

  const caseSummary = `
    Client submitted passport, utility bill older than 120 days,
    and no beneficial ownership declaration. Client is resident in UK,
    onboarding into discretionary portfolio management.
  `;

  const response = await queryEngine.query({
    query: `
      Based on the firm's KYC policy, determine whether this case is:
      APPROVE, REJECT, or ESCALATE.
      Include reasons and cite the relevant control gaps.
      
      Case:
      ${caseSummary}
    `,
  });

  console.log(String(response));
}

verifyClientCase();

4) Add structured output for downstream workflow routing

For wealth management operations, you want machine-readable decisions that can feed case management systems. Use a schema so compliance teams get consistent fields.

import { z } from "zod";

const KycDecisionSchema = z.object({
  decision: z.enum(["APPROVE", "REJECT", "ESCALATE"]),
  riskLevel: z.enum(["LOW", "MEDIUM", "HIGH"]),
  reasons: z.array(z.string()),
});

type KycDecision = z.infer<typeof KycDecisionSchema>;

async function runKycAgent(): Promise<KycDecision> {
  const index = await buildPolicyIndex();
  const qe = index.asQueryEngine();

  const result = await qe.query({
    query: `
      Return JSON only with fields:
      decision one of APPROVE|REJECT|ESCALATE,
      riskLevel one of LOW|MEDIUM|HIGH,
      reasons as an array of strings.

      Client has expired address proof and missing beneficial owner declaration.
    `,
  });

	const parsed = KycDecisionSchema.parse(JSON.parse(String(result)));
	return parsed;
}

A clean production pattern is:

  1. Ingest client docs into your document store.
  2. Retrieve policy controls with VectorStoreIndex.
  3. Generate a structured recommendation.
  4. Persist the raw retrieval context plus final JSON into your audit log.

Production Considerations

  • Data residency

    • Keep client documents and embeddings in approved regions only.
    • If your firm operates across EMEA/US/APAC, isolate indexes by jurisdiction instead of mixing everything into one global store.
  • Auditability

    • Log retrieved chunks, model version, prompt template version, timestamp, and case ID.
    • Regulators care about traceability more than clever prompting.
  • Monitoring

    • Track escalation rate, false approvals caught by humans, retrieval miss rate, and average time-to-decision.
    • A sudden drop in escalations can mean your prompts got too permissive.
  • Guardrails

RiskControl
Sanctions exposureHard block on external approval when screening is incomplete
PEP / adverse mediaMandatory human review
Missing beneficial ownershipAuto-escalate
Cross-border trust structuresJurisdiction-specific rule path

Common Pitfalls

  1. Treating the LLM as the source of truth
    Don’t ask it to “decide based on common sense.” Always ground decisions in indexed policy documents and deterministic business rules.

  2. Mixing jurisdictions in one undifferentiated index
    Wealth management KYC varies by region. If EU retention rules or local onboarding requirements differ, split indexes or filter by metadata like jurisdiction.

  3. Returning prose instead of structured outcomes
    Compliance ops needs decision, riskLevel, and reasons, not paragraphs. Use a schema like Zod and reject invalid outputs before they hit workflow systems.

  4. Skipping human review thresholds
    An agent should not close edge cases involving trusts, PEPs, sanctions ambiguity, or missing ownership data. Route those cases into a manual queue with full evidence attached.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides