How to Build a KYC verification Agent Using LangChain in TypeScript for pension funds
A KYC verification agent for pension funds checks identity data, validates documents, flags missing information, and routes risky cases for human review. It matters because pension funds handle long-lived, regulated customer relationships where weak onboarding creates compliance risk, delayed contributions, and audit failures.
Architecture
- •
Document ingestion layer
- •Accepts ID scans, proof of address, tax forms, and beneficiary records.
- •Normalizes file text using OCR or upstream extraction before the LLM sees it.
- •
KYC rules engine
- •Encodes pension-fund-specific checks like name matching, address recency, sanctions screening status, and completeness of mandatory fields.
- •Keeps deterministic logic outside the model.
- •
LangChain agent
- •Uses
ChatOpenAIplus tool calling to decide which checks to run. - •Produces structured outputs instead of free-form summaries.
- •Uses
- •
Audit trail store
- •Persists every input, tool call, model output, and reviewer override.
- •Needed for regulator queries and internal compliance reviews.
- •
Human review queue
- •Handles exceptions such as mismatched names, expired documents, or politically exposed person flags.
- •Prevents automatic approval on ambiguous cases.
- •
Data residency boundary
- •Ensures PII stays in the required region and only approved providers are used.
- •Important when pension schemes operate under local privacy and retention rules.
Implementation
1) Define the KYC result schema
Use Zod so the agent returns a predictable structure. For regulated workflows, this is non-negotiable.
import { z } from "zod";
export const KycResultSchema = z.object({
decision: z.enum(["approve", "reject", "manual_review"]),
confidence: z.number().min(0).max(1),
reasons: z.array(z.string()),
missingFields: z.array(z.string()),
riskFlags: z.array(z.enum([
"name_mismatch",
"expired_document",
"address_unverified",
"sanctions_hit",
"pep_hit"
])),
});
export type KycResult = z.infer<typeof KycResultSchema>;
2) Build tools for deterministic checks
Keep hard checks in code. The model should orchestrate them, not invent them.
import { tool } from "@langchain/core/tools";
import { z } from "zod";
const kycProfileSchema = z.object({
fullName: z.string(),
dateOfBirth: z.string(),
address: z.string(),
idNumber: z.string().optional(),
documentExpiry: z.string().optional(),
});
export const checkCompleteness = tool(
async (input) => {
const parsed = kycProfileSchema.parse(input);
const missingFields = Object.entries(parsed)
.filter(([, value]) => value === undefined || value === "")
.map(([key]) => key);
return {
complete: missingFields.length === 0,
missingFields,
};
},
{
name: "check_completeness",
description: "Checks whether required KYC fields are present.",
schema: kycProfileSchema,
}
);
export const checkDocumentExpiry = tool(
async ({ documentExpiry }: { documentExpiry?: string }) => {
if (!documentExpiry) return { expired: true };
const expiry = new Date(documentExpiry);
return { expired: expiry.getTime() < Date.now() };
},
{
name: "check_document_expiry",
description: "Checks whether the uploaded identity document is expired.",
schema: z.object({ documentExpiry: z.string().optional() }),
}
);
3) Create the LangChain agent with structured output
For this use case, I prefer a simple runnable pipeline over a complex autonomous loop. It is easier to audit and easier to keep within policy.
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { StructuredOutputParser } from "@langchain/core/output_parsers";
import { KycResultSchema } from "./schemas.js";
import { checkCompleteness, checkDocumentExpiry } from "./tools.js";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const parser = StructuredOutputParser.fromZodSchema(KycResultSchema);
const prompt = ChatPromptTemplate.fromMessages([
[
"system",
`You are a KYC verification agent for a pension fund.
Use only the provided facts.
Do not approve if mandatory fields are missing or if a document is expired.
Return structured output matching the schema.`,
],
[
"human",
`KYC profile:
{profile}
Completeness check:
{completeness}
Expiry check:
{expiry}
Format instructions:
{format_instructions}`,
],
]);
export const kycChain = RunnableSequence.from([
async (input: { profile: string }) => {
const profileObj = JSON.parse(input.profile);
const completeness = await checkCompleteness.invoke(profileObj);
const expiry = await checkDocumentExpiry.invoke({
documentExpiry: profileObj.documentExpiry,
});
return {
profile: input.profile,
completeness: JSON.stringify(completeness),
expiry: JSON.stringify(expiry),
format_instructions: parser.getFormatInstructions(),
profileObj,
completenessObj: completeness,
expiryObj: expiry,
};
},
prompt,
llm,
]);
4) Parse the result and enforce policy before approval
The model can recommend approval, but your policy layer makes the final call.
import { KycResultSchema } from "./schemas.js";
export async function verifyKyc(profileJsonString: string) {
const raw = await kycChain.invoke({ profile: profileJsonString });
const text = typeof raw.content === "string" ? raw.content : JSON.stringify(raw.content);
const parsed = KycResultSchema.parse(JSON.parse(text));
// Policy gate for pension funds
if (parsed.riskFlags.includes("sanctions_hit") || parsed.riskFlags.includes("pep_hit")) {
return { ...parsed, decision: "manual_review" as const };
}
if (parsed.missingFields.length > ["fullName", "dateOfBirth", "address"].length) {
return { ...parsed, decision: "manual_review" as const };
}
return parsed;
}
Production Considerations
- •
Keep PII inside your residency boundary
Use regional model endpoints and region-scoped storage. Pension funds often have strict data localization requirements for member records and identity documents.
- •
Log every decision path
Persist the input payload hash, tool results, model version, prompt version, and final decision. Auditors will ask why a member was routed to manual review or rejected.
- •
Add hard guardrails before any approval
Never let the LLM override sanctions hits, expired IDs, or incomplete mandatory fields. The agent should classify and explain; policy code should decide.
- •
Monitor drift in exception rates
If manual reviews spike after a prompt change or model upgrade, treat it as a release incident. In pension operations, onboarding delays directly affect contribution setup and member service SLAs.
Common Pitfalls
- •
Using the LLM as the source of truth
The agent should not infer identity validity from narrative text alone. Extract fields upstream and validate them deterministically with code.
- •
Skipping an audit trail
Storing only the final answer is not enough. Keep intermediate tool outputs and prompt versions so compliance can reconstruct each decision.
- •
Letting free-form output into downstream systems
Never pass raw model text to case management or core admin systems. Parse into a Zod schema first, then map only approved fields into your workflow.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit