How to Build a KYC verification Agent Using LangChain in TypeScript for banking

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlangchaintypescriptbanking

A KYC verification agent automates the first pass of customer due diligence: it collects identity data, checks it against policy, flags missing or inconsistent fields, and routes risky cases for manual review. In banking, that matters because onboarding speed is useful only if you can prove compliance, preserve an audit trail, and keep sensitive data inside the right controls.

Architecture

A production KYC agent in banking needs these components:

  • Input normalization layer

    • Accepts customer-submitted data from forms, PDFs, OCR output, or API payloads.
    • Converts everything into a strict internal schema before any LLM call.
  • Policy engine

    • Encodes bank-specific rules like required fields, document freshness, sanctions screening triggers, and jurisdiction-specific checks.
    • Keeps deterministic decisions out of the model.
  • LangChain reasoning layer

    • Uses ChatOpenAI with structured output to classify issues and summarize findings.
    • Only handles ambiguous interpretation, not final compliance decisions.
  • Evidence store

    • Persists extracted facts, model outputs, timestamps, and source references.
    • Supports auditability and later regulator review.
  • Escalation workflow

    • Routes suspicious or incomplete cases to human analysts.
    • Produces a clear reason code and supporting evidence.
  • Security and residency controls

    • Redacts PII before prompts where possible.
    • Ensures logs, traces, and vector stores stay in approved regions.

Implementation

1) Define the KYC schema and validation boundary

Start with a strict schema. Do not let free-form text flow into your decision logic.

import { z } from "zod";

export const KycInputSchema = z.object({
  fullName: z.string().min(1),
  dateOfBirth: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
  countryOfResidence: z.string().min(2),
  idType: z.enum(["passport", "national_id", "drivers_license"]),
  idNumber: z.string().min(3),
  sourceOfFunds: z.string().optional(),
});

export type KycInput = z.infer<typeof KycInputSchema>;

export const KycReviewSchema = z.object({
  status: z.enum(["approve", "reject", "manual_review"]),
  riskLevel: z.enum(["low", "medium", "high"]),
  reasons: z.array(z.string()).min(1),
  missingFields: z.array(z.string()),
});

This boundary is where you enforce format rules before any LLM call. If the payload fails validation, return a deterministic rejection or route to manual intake.

2) Build the LangChain agent with structured output

Use ChatOpenAI and withStructuredOutput so the model returns machine-readable decisions. For banking workflows, this is much safer than parsing plain text.

import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";
import { KycInputSchema, KycReviewSchema } from "./schemas";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

export async function reviewKyc(input: unknown) {
  const kyc = KycInputSchema.parse(input);

  const reviewer = llm.withStructuredOutput(KycReviewSchema);

  const messages = [
    new SystemMessage(
      [
        "You are a KYC triage assistant for a bank.",
        "Do not make final compliance decisions.",
        "Only assess completeness and obvious risk indicators.",
        "If critical information is missing or inconsistent, recommend manual_review.",
        "Return concise reasons suitable for audit logs."
      ].join(" ")
    ),
    new HumanMessage(
      JSON.stringify({
        applicant: kyc,
        policyHints: {
          requireManualReviewIf:
            ["sanctions_match_possible", "document_expired", "address_mismatch"],
        },
      })
    ),
  ];

  return await reviewer.invoke(messages);
}

This pattern keeps the model constrained. The system message sets scope; the schema enforces output shape; temperature: 0 reduces variance for repeatable reviews.

3) Add deterministic policy checks before the model

The model should not decide things your rules engine can decide. Check required fields and obvious policy violations first.

import { reviewKyc } from "./reviewKyc";

function deterministicChecks(input: {
  fullName?: string;
  dateOfBirth?: string;
  countryOfResidence?: string;
}) {
    const missingFields = ["fullName", "dateOfBirth", "countryOfResidence"].filter(
      (field) => !input[field as keyof typeof input]
    );

    if (missingFields.length > 0) {
      return {
        status: "manual_review" as const,
        riskLevel: "medium" as const,
        reasons: [`Missing required fields: ${missingFields.join(", ")}`],
        missingFields,
      };
    }

    return null;
}

async function main() {
  const input = {
    fullName: "Jane Doe",
    dateOfBirth: "1990-04-12",
    countryOfResidence: "GB",
    idType: "passport",
    idNumber: "123456789",
    sourceOfFunds: "salary",
  };

  const ruleResult = deterministicChecks(input);
  if (ruleResult) {
    console.log(ruleResult);
    return;
  }

  const result = await reviewKyc(input);
  console.log(result);
}

main();

This split matters. Deterministic checks are explainable and easy to audit. The LLM handles triage language and ambiguous patterns only.

Table — what goes where

ConcernDeterministic codeLangChain / LLM
Required field validationYesNo
Date format checksYesNo
Policy thresholdsYesNo
Document summaryNoYes
Risk wording for analyst notesNoYes
Final compliance approvalYes, via workflow rulesNo

Production Considerations

  • Keep PII out of prompts when possible

    • Redact account numbers, passport numbers, and addresses unless they are needed for a specific check.
    • Use tokenization or partial masking before calling the model.
  • Store an audit trail

    • Persist input hashes, validation results, model version, prompt template version, output JSON, and reviewer overrides.
    • Regulators will ask why a case was approved or escalated.
  • Enforce data residency

    • Run inference in approved regions only.
    • Make sure traces from LangChain callbacks do not leak regulated data into unsupported observability tools.
  • Add human-in-the-loop gates

    • Any sanctions ambiguity, document mismatch, or high-risk geography should go to an analyst queue.
    • The agent should recommend actions; it should not close high-risk cases autonomously.

Common Pitfalls

  1. Letting the model make final compliance decisions

    • Avoid this by using the LLM only for triage and narrative extraction.
    • Final approve/reject logic should sit in deterministic workflow code or analyst review tooling.
  2. Sending raw sensitive data into every prompt

    • Mask unnecessary PII before invoking LangChain.
    • Pass only fields needed for the specific check being performed.
  3. Skipping structured outputs

    • Free-form text breaks downstream automation and audit logging.
    • Use withStructuredOutput() with a Zod schema so every response has stable fields like status, riskLevel, and reasons.
  4. Ignoring jurisdiction-specific policy differences

    • A UK retail onboarding flow is not the same as an EU SME onboarding flow.
    • Keep country rules externalized so compliance can update them without redeploying the agent.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides