How to Build a KYC verification Agent Using LangGraph in TypeScript for investment banking
A KYC verification agent automates the first pass of client onboarding: it ingests identity documents, extracts and validates key fields, checks them against policy rules, and routes suspicious cases to a human analyst. In investment banking, that matters because onboarding delays cost revenue, but weak KYC creates regulatory exposure, audit findings, and reputational damage.
Architecture
- •
Document intake layer
- •Accepts passports, national IDs, utility bills, corporate registration docs, and beneficial ownership forms.
- •Normalizes uploads into a consistent internal document object.
- •
Extraction node
- •Uses OCR or a document parser to pull out name, DOB, address, document number, expiration date, entity registration data, and UBO details.
- •Returns structured JSON for downstream validation.
- •
Policy and sanctions check node
- •Validates extracted data against KYC rules.
- •Checks watchlists, sanctions lists, PEP flags, jurisdiction risk, and document expiry.
- •
Risk scoring node
- •Assigns a risk tier based on mismatches, missing fields, geography, entity type, and adverse signals.
- •Produces an explainable score for auditability.
- •
Human review routing
- •Sends low-confidence or high-risk cases to compliance analysts.
- •Preserves the full decision trail for review and sign-off.
- •
Audit log store
- •Persists every state transition, rule outcome, and reviewer action.
- •Supports regulatory evidence requests and internal model governance.
Implementation
1) Define the agent state and graph shape
For KYC in banking, keep the state explicit. You want every field that influenced the decision to be visible in the graph state so you can reconstruct the path later.
import { StateGraph, Annotation } from "@langchain/langgraph";
type DocumentInput = {
id: string;
type: "passport" | "id_card" | "utility_bill" | "company_registry";
content: string;
};
type ExtractedKyc = {
fullName?: string;
dob?: string;
address?: string;
docNumber?: string;
expiryDate?: string;
entityName?: string;
beneficialOwners?: Array<{ name: string; pct: number }>;
};
const KycState = Annotation.Root({
documents: Annotation<DocumentInput[]>(),
extracted: Annotation<ExtractedKyc>(),
sanctionsHit: Annotation<boolean>(),
riskScore: Annotation<number>(),
decision: Annotation<"approve" | "reject" | "manual_review">(),
auditTrail: Annotation<string[]>(),
});
type KycStateType = typeof KycState.State;
2) Add deterministic nodes for extraction and policy checks
Do not put your compliance logic inside an LLM prompt. Use deterministic code for policy decisions and keep the model only for extraction or summarization where needed.
const extractNode = async (state: KycStateType): Promise<Partial<KycStateType>> => {
const passport = state.documents.find((d) => d.type === "passport");
const utilityBill = state.documents.find((d) => d.type === "utility_bill");
// Replace this with OCR / document parsing output
const extracted: ExtractedKyc = {
fullName: "Jane Doe",
dob: "1988-04-12",
address: utilityBill ? "10 Bishopsgate, London" : undefined,
docNumber: passport ? "P12345678" : undefined,
expiryDate: "2030-01-01",
beneficialOwners: [],
entityName: undefined,
};
return {
extracted,
auditTrail: [`extractNode: parsed ${state.documents.length} documents`],
};
};
const policyNode = async (state: KycStateType): Promise<Partial<KycStateType>> => {
const sanctionsHit = Boolean(state.extracted.fullName?.toLowerCase().includes("sanction"));
const expired =
!!state.extracted.expiryDate && new Date(state.extracted.expiryDate) < new Date();
let riskScore = 10;
if (sanctionsHit) riskScore += 80;
if (expired) riskScore += 40;
if (!state.extracted.address) riskScore += 15;
return {
sanctionsHit,
riskScore,
auditTrail: [
...state.auditTrail,
`policyNode: sanctionsHit=${sanctionsHit}, expired=${expired}, riskScore=${riskScore}`,
],
};
};
3) Route to approval or manual review with StateGraph
This is where LangGraph fits well. You model the workflow as a graph with conditional routing instead of a single opaque chain. That gives you better control over escalation paths and easier audit reconstruction.
const routeByRisk = (state: KycStateType) => {
// Investment banking thresholds are usually conservative
if (state.sanctionsHit) return "reject";
if (state.riskScore >= 50) return "manual_review";
return "approve";
};
const approveNode = async (state: KycStateType): Promise<Partial<KycStateType>> => ({
decision: "approve",
auditTrail: [...state.auditTrail, "approveNode: approved automatically"],
});
const rejectNode = async (state: KycStateType): Promise<Partial<KycStateType>> => ({
decision: "reject",
auditTrail: [...state.auditTrail, "rejectNode: rejected due to policy breach"],
});
const manualReviewNode = async (state: KycStateType): Promise<Partial<KycStateType>> => ({
decision: "manual_review",
auditTrail: [
...state.auditTrail,
"manualReviewNode: routed to compliance analyst queue",
],
});
const graph = new StateGraph(KycState)
.addNode("extract", extractNode)
.addNode("policy", policyNode)
.addNode("approve", approveNode)
.addNode("reject", rejectNode)
.addNode("manual_review", manualReviewNode)
.addEdge("__start__", "extract")
.addEdge("extract", "policy")
.addConditionalEdges("policy", routeByRisk)
.addEdge("approve", "__end__")
.addEdge("reject", "__end__")
.addEdge("manual_review", "__end__");
export const kycApp = graph.compile();
4) Invoke the agent and persist the audit trail
In production you should write the final state to an immutable store. The key requirement is that compliance can replay why a client was approved or escalated.
async function runKyc() {
const result = await kycApp.invoke({
documents: [
{ id: "doc_1", type: "passport", content: "<binary-or-text>" },
{ id: "doc_2", type: "utility_bill", content: "<binary-or-text>" },
],
extracted: {},
sanctionsHit: false,
riskScore: 0,
decision: undefined as any,
auditTrail: [],
});
console.log({
decision: result.decision,
riskScore: result.riskScore,
sanctionsHit: result.sanctionsHit,
auditTrailLength: result.auditTrail.length,
});
}
runKyc();
Production Considerations
- •
Data residency
Keep document processing inside the required region. For cross-border banking groups, separate EU/UK/US deployments so customer PII does not leave approved jurisdictions.
- •
Auditability
Persist raw inputs, extracted fields, rule outcomes, timestamps, and reviewer overrides. Regulators will care less about your model choice than whether you can reproduce a decision months later.
- •
Monitoring
Track manual review rate, false positive sanctions hits, average onboarding latency, and escalation spikes by region or product. A sudden increase in manual reviews usually means your extraction quality degraded or a policy rule changed.
- •
Guardrails
Hard-block approvals when sanctions screening fails or mandatory fields are missing. Do not let an LLM override deterministic compliance logic; use it only where uncertainty is acceptable.
Common Pitfalls
- •
Using the LLM as the final decision-maker
That is a bad pattern for regulated onboarding. Keep approvals and rejections driven by explicit rules plus human review thresholds.
- •
Not versioning policy rules
If your thresholds change without version tags, you cannot explain why one client was approved last quarter and rejected this quarter. Store rule versions alongside each decision record.
- •
Ignoring partial-document cases
Many real submissions are incomplete or blurry. Route missing fields to manual review instead of forcing a guess; bad data in KYC becomes bad data everywhere else in the bank.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit