How to Build a KYC verification Agent Using LangChain in TypeScript for payments

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlangchaintypescriptpayments

A KYC verification agent for payments takes customer identity data, extracts the right fields from documents and forms, checks them against policy, and returns a decision you can audit. It matters because payment flows are high-risk: bad identity checks create fraud exposure, failed compliance reviews, and painful manual ops.

Architecture

  • Ingress API
    • Receives the applicant payload: name, DOB, address, country, document metadata, and consent flags.
  • Document extraction layer
    • Uses an LLM to normalize raw OCR text or uploaded document text into structured KYC fields.
  • Policy engine
    • Applies deterministic rules for jurisdiction, sanctions flags, age thresholds, document freshness, and required fields.
  • LangChain agent
    • Orchestrates extraction, validation, and decisioning with tool calls and structured outputs.
  • Audit store
    • Persists inputs, outputs, model version, prompt version, and decision rationale for compliance review.
  • Human review queue
    • Routes edge cases and low-confidence cases to an analyst instead of auto-approving.

Implementation

1) Define the KYC schema and tools

For payments, keep the model on a short leash. Use structured output for the extracted KYC payload and deterministic tools for policy checks.

import { z } from "zod";
import { ChatOpenAI } from "@langchain/openai";
import { DynamicStructuredTool } from "@langchain/core/tools";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { createOpenAIFunctionsAgent } from "langchain/agents";
import { AgentExecutor } from "langchain/agents";

const KycSchema = z.object({
  fullName: z.string(),
  dateOfBirth: z.string(), // ISO date string
  countryOfResidence: z.string(),
  documentType: z.enum(["passport", "national_id", "drivers_license"]),
  documentNumber: z.string(),
  address: z.string().optional(),
  riskFlags: z.array(z.string()).default([]),
});

type KycRecord = z.infer<typeof KycSchema>;

const policyTool = new DynamicStructuredTool({
  name: "kyc_policy_check",
  description: "Checks whether a KYC record meets minimum payment onboarding requirements.",
  schema: KycSchema,
  func: async (input) => {
    const missingAddressCountries = ["US", "GB", "DE"];
    const needsAddress =
      missingAddressCountries.includes(input.countryOfResidence.toUpperCase()) && !input.address;

    const age = new Date().getFullYear() - new Date(input.dateOfBirth).getFullYear();
    const underage = age < 18;

    return JSON.stringify({
      approved: !needsAddress && !underage,
      reasons: [
        ...(needsAddress ? ["address_required_for_jurisdiction"] : []),
        ...(underage ? ["customer_under_18"] : []),
      ],
      auditTags: {
        jurisdiction: input.countryOfResidence,
        documentType: input.documentType,
      },
    });
  },
});

2) Build the LangChain agent with a strict prompt

Use a prompt that tells the model it is not allowed to invent facts. In payments, hallucinated identity data is a compliance bug.

const llm = new ChatOpenAI({
  modelName: "gpt-4o-mini",
  temperature: 0,
});

const prompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    `You are a KYC verification agent for a payment platform.
Only use the provided customer data.
Do not infer missing identity facts.
If required fields are missing or unclear, route to manual review.`,
  ],
  [
    "human",
    `Customer payload:
{payload}

Run policy checks and return a decision.`,
  ],
]);

const agent = await createOpenAIFunctionsAgent({
  llm,
  tools: [policyTool],
  prompt,
});

const executor = new AgentExecutor({
  agent,
  tools: [policyTool],
});

3) Execute the workflow and persist an audit record

The key pattern is to separate extraction from decisioning. First normalize the incoming text into your schema; then run policy evaluation; then store everything with traceability.

async function verifyKyc(payloadText: string) {
  const extractionPrompt = ChatPromptTemplate.fromMessages([
    ["system", "Extract KYC fields into the exact schema. Return only valid JSON."],
    ["human", "{text}"],
  ]);

  const extractionChain = extractionPrompt.pipe(
    llm.withStructuredOutput(KycSchema)
  );

  const kycRecord = await extractionChain.invoke({ text: payloadText });

  const result = await executor.invoke({
    payload: JSON.stringify(kycRecord),
  });

  return {
    kycRecord,
    decision: result.output,
    model: "gpt-4o-mini",
    timestamp: new Date().toISOString(),
  };
}

verifyKyc(`
Full name: Amina Yusuf
DOB: 1996-03-14
Country of residence: KE
Document type: passport
Document number: P1234567
Address: Nairobi
`)
.then(console.log);

4) Add a manual-review fallback

For payments, auto-decline is safer than auto-approve when confidence is low or required evidence is incomplete. Keep this logic outside the LLM so it stays deterministic.

function needsManualReview(record: KycRecord): boolean {
  return (
    !record.fullName ||
    !record.dateOfBirth ||
    !record.documentNumber ||
    record.riskFlags.includes("possible_forgery")
  );
}

Production Considerations

  • Auditability
    • Persist raw input, normalized output, policy result, model name, prompt version, and timestamp. Compliance teams will ask for a replayable trail.
  • Data residency
    • Keep PII in-region where required by your regulator or processor contract. If you use hosted LLMs, verify region support and retention settings.
  • Guardrails
    • Never let the agent approve based on free-form reasoning alone. Final approval should come from deterministic policy checks plus confidence thresholds.
  • Monitoring
    • Track approval rate by country, manual-review rate, false accept rate, false reject rate, and tool-call failures. Spikes usually mean either prompt drift or upstream OCR issues.

Common Pitfalls

  • Letting the model invent missing identity data

    If a passport number is missing or unreadable, route to review. Do not ask the model to “fill in” gaps from context.

  • Mixing business policy with LLM output

    Age limits, jurisdiction rules, and required-document logic should live in code or rules config. The LLM should extract and explain; it should not be your source of truth.

  • Ignoring compliance artifacts

    If you cannot show what data was used and why a decision was made, your KYC flow will fail audits. Store prompts, outputs, tool results, and reviewer overrides.

  • Sending unnecessary PII to external services

    Minimize payloads before calling the model. Redact account numbers, national IDs where possible, and anything not needed for verification.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides