How to Build a KYC verification Agent Using LangChain in TypeScript for pension funds

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlangchaintypescriptpension-funds

A KYC verification agent for pension funds checks identity data, validates documents, flags missing information, and routes risky cases for human review. It matters because pension funds handle long-lived, regulated customer relationships where weak onboarding creates compliance risk, delayed contributions, and audit failures.

Architecture

  • Document ingestion layer

    • Accepts ID scans, proof of address, tax forms, and beneficiary records.
    • Normalizes file text using OCR or upstream extraction before the LLM sees it.
  • KYC rules engine

    • Encodes pension-fund-specific checks like name matching, address recency, sanctions screening status, and completeness of mandatory fields.
    • Keeps deterministic logic outside the model.
  • LangChain agent

    • Uses ChatOpenAI plus tool calling to decide which checks to run.
    • Produces structured outputs instead of free-form summaries.
  • Audit trail store

    • Persists every input, tool call, model output, and reviewer override.
    • Needed for regulator queries and internal compliance reviews.
  • Human review queue

    • Handles exceptions such as mismatched names, expired documents, or politically exposed person flags.
    • Prevents automatic approval on ambiguous cases.
  • Data residency boundary

    • Ensures PII stays in the required region and only approved providers are used.
    • Important when pension schemes operate under local privacy and retention rules.

Implementation

1) Define the KYC result schema

Use Zod so the agent returns a predictable structure. For regulated workflows, this is non-negotiable.

import { z } from "zod";

export const KycResultSchema = z.object({
  decision: z.enum(["approve", "reject", "manual_review"]),
  confidence: z.number().min(0).max(1),
  reasons: z.array(z.string()),
  missingFields: z.array(z.string()),
  riskFlags: z.array(z.enum([
    "name_mismatch",
    "expired_document",
    "address_unverified",
    "sanctions_hit",
    "pep_hit"
  ])),
});

export type KycResult = z.infer<typeof KycResultSchema>;

2) Build tools for deterministic checks

Keep hard checks in code. The model should orchestrate them, not invent them.

import { tool } from "@langchain/core/tools";
import { z } from "zod";

const kycProfileSchema = z.object({
  fullName: z.string(),
  dateOfBirth: z.string(),
  address: z.string(),
  idNumber: z.string().optional(),
  documentExpiry: z.string().optional(),
});

export const checkCompleteness = tool(
  async (input) => {
    const parsed = kycProfileSchema.parse(input);
    const missingFields = Object.entries(parsed)
      .filter(([, value]) => value === undefined || value === "")
      .map(([key]) => key);

    return {
      complete: missingFields.length === 0,
      missingFields,
    };
  },
  {
    name: "check_completeness",
    description: "Checks whether required KYC fields are present.",
    schema: kycProfileSchema,
  }
);

export const checkDocumentExpiry = tool(
  async ({ documentExpiry }: { documentExpiry?: string }) => {
    if (!documentExpiry) return { expired: true };
    const expiry = new Date(documentExpiry);
    return { expired: expiry.getTime() < Date.now() };
  },
  {
    name: "check_document_expiry",
    description: "Checks whether the uploaded identity document is expired.",
    schema: z.object({ documentExpiry: z.string().optional() }),
  }
);

3) Create the LangChain agent with structured output

For this use case, I prefer a simple runnable pipeline over a complex autonomous loop. It is easier to audit and easier to keep within policy.

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { StructuredOutputParser } from "@langchain/core/output_parsers";
import { KycResultSchema } from "./schemas.js";
import { checkCompleteness, checkDocumentExpiry } from "./tools.js";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const parser = StructuredOutputParser.fromZodSchema(KycResultSchema);

const prompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    `You are a KYC verification agent for a pension fund.
Use only the provided facts.
Do not approve if mandatory fields are missing or if a document is expired.
Return structured output matching the schema.`,
  ],
  [
    "human",
    `KYC profile:
{profile}

Completeness check:
{completeness}

Expiry check:
{expiry}

Format instructions:
{format_instructions}`,
  ],
]);

export const kycChain = RunnableSequence.from([
  async (input: { profile: string }) => {
    const profileObj = JSON.parse(input.profile);
    const completeness = await checkCompleteness.invoke(profileObj);
    const expiry = await checkDocumentExpiry.invoke({
      documentExpiry: profileObj.documentExpiry,
    });

    return {
      profile: input.profile,
      completeness: JSON.stringify(completeness),
      expiry: JSON.stringify(expiry),
      format_instructions: parser.getFormatInstructions(),
      profileObj,
      completenessObj: completeness,
      expiryObj: expiry,
    };
  },
  prompt,
  llm,
]);

4) Parse the result and enforce policy before approval

The model can recommend approval, but your policy layer makes the final call.

import { KycResultSchema } from "./schemas.js";

export async function verifyKyc(profileJsonString: string) {
  const raw = await kycChain.invoke({ profile: profileJsonString });
  const text = typeof raw.content === "string" ? raw.content : JSON.stringify(raw.content);

  const parsed = KycResultSchema.parse(JSON.parse(text));

	// Policy gate for pension funds
	if (parsed.riskFlags.includes("sanctions_hit") || parsed.riskFlags.includes("pep_hit")) {
	  return { ...parsed, decision: "manual_review" as const };
	}

	if (parsed.missingFields.length > ["fullName", "dateOfBirth", "address"].length) {
	  return { ...parsed, decision: "manual_review" as const };
	}

	return parsed;
}

Production Considerations

  • Keep PII inside your residency boundary

    Use regional model endpoints and region-scoped storage. Pension funds often have strict data localization requirements for member records and identity documents.

  • Log every decision path

    Persist the input payload hash, tool results, model version, prompt version, and final decision. Auditors will ask why a member was routed to manual review or rejected.

  • Add hard guardrails before any approval

    Never let the LLM override sanctions hits, expired IDs, or incomplete mandatory fields. The agent should classify and explain; policy code should decide.

  • Monitor drift in exception rates

    If manual reviews spike after a prompt change or model upgrade, treat it as a release incident. In pension operations, onboarding delays directly affect contribution setup and member service SLAs.

Common Pitfalls

  • Using the LLM as the source of truth

    The agent should not infer identity validity from narrative text alone. Extract fields upstream and validate them deterministically with code.

  • Skipping an audit trail

    Storing only the final answer is not enough. Keep intermediate tool outputs and prompt versions so compliance can reconstruct each decision.

  • Letting free-form output into downstream systems

    Never pass raw model text to case management or core admin systems. Parse into a Zod schema first, then map only approved fields into your workflow.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides