How to Build a document extraction Agent Using LangGraph in TypeScript for wealth management

By Cyprian AaronsUpdated 2026-04-21
document-extractionlanggraphtypescriptwealth-management

A document extraction agent in wealth management takes client PDFs, statements, KYC packs, trust deeds, and account opening forms, then turns them into structured data your downstream systems can trust. It matters because most operational risk in wealth workflows comes from manual rekeying, missed fields, and inconsistent interpretation of sensitive documents.

Architecture

  • Document ingress

    • Accept PDFs, scans, and image-based attachments from secure storage or case management.
    • Enforce file type checks, size limits, and tenant-level routing before extraction starts.
  • Text extraction layer

    • Use OCR for scanned documents and native text extraction for digital PDFs.
    • Normalize page order, headers, footers, and table fragments before sending content to the model.
  • LangGraph orchestration

    • Model the flow as a state machine with StateGraph.
    • Split work into nodes like classify document, extract fields, validate output, and route for human review.
  • Structured extraction model

    • Use an LLM with schema-constrained output for fields like client name, account number, beneficial owner details, tax residency, and signatures.
    • Keep the schema strict so downstream compliance systems do not ingest free-form text.
  • Validation and compliance guardrails

    • Validate against business rules: required fields present, dates valid, jurisdiction allowed, names consistent across documents.
    • Flag exceptions for operations or compliance review instead of auto-approving ambiguous cases.
  • Audit trail and persistence

    • Store source document hashes, model version, prompt version, extracted JSON, and reviewer actions.
    • Wealth management teams need a defensible record for regulators and internal audit.

Implementation

  1. Define the graph state and schema

    Start with a typed state that carries the raw document text, extracted result, validation status, and escalation flags. In wealth management you want this state to be explicit because every branch may need to be audited later.

import { z } from "zod";
import { StateGraph, START, END } from "@langchain/langgraph";

const ExtractedFieldsSchema = z.object({
  clientName: z.string().optional(),
  accountNumber: z.string().optional(),
  taxResidency: z.string().optional(),
  beneficialOwner: z.string().optional(),
  kycStatus: z.enum(["complete", "incomplete", "needs_review"]).optional(),
});

type AgentState = {
  documentText: string;
  extracted?: z.infer<typeof ExtractedFieldsSchema>;
  validationErrors?: string[];
  needsReview?: boolean;
};

const StateSchema = z.object({
  documentText: z.string(),
  extracted: ExtractedFieldsSchema.optional(),
  validationErrors: z.array(z.string()).optional(),
  needsReview: z.boolean().optional(),
});
  1. Build extraction and validation nodes

    The extraction node should call your LLM with a strict schema. The validation node should apply deterministic rules that reflect wealth ops reality: missing beneficial owner data on an entity account is not a soft failure.

import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";

const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });

async function extractNode(state: AgentState): Promise<Partial<AgentState>> {
  const prompt = `
Extract structured fields from this wealth management document.
Return only valid JSON matching:
{
  "clientName"?: string,
  "accountNumber"?: string,
  "taxResidency"?: string,
  "beneficialOwner"?: string,
  "kycStatus"?: "complete" | "incomplete" | "needs_review"
}

Document:
${state.documentText}
`;

  const response = await llm.invoke([new HumanMessage(prompt)]);
  const parsed = ExtractedFieldsSchema.parse(JSON.parse(response.content as string));
  return { extracted: parsed };
}

async function validateNode(state: AgentState): Promise<Partial<AgentState>> {
  const errors: string[] = [];
  const extracted = state.extracted;

   if (!extracted?.clientName) errors.push("Missing client name");
   if (!extracted?.accountNumber) errors.push("Missing account number");

   if (extracted?.kycStatus === "incomplete") {
     errors.push("KYC marked incomplete");
   }

   if (errors.length > 0) {
     return { validationErrors: errors, needsReview: true };
   }

   return { validationErrors: [], needsReview: false };
}
  1. Wire routing logic with StateGraph

    This is where LangGraph earns its keep. You can branch to human review when the document is incomplete or when compliance-sensitive fields are missing.

function routeAfterValidation(state: AgentState): "review" | "done" {
  return state.needsReview ? "review" : "done";
}

async function reviewNode(state: AgentState): Promise<Partial<AgentState>> {
  // In production this would create a case in your workflow system.
  return {
    validationErrors: [
      ...(state.validationErrors ?? []),
      "Escalated to operations/compliance review",
    ],
    needsReview: true,
  };
}

const graph = new StateGraph(StateSchema)
  .addNode("extract", extractNode)
  .addNode("validate", validateNode)
  .addNode("review", reviewNode)
  
graph.addEdge(START, "extract");
graph.addEdge("extract", "validate");
graph.addConditionalEdges("validate", routeAfterValidation, {
    review: "review",
    done: END,
});
graph.addEdge("review", END);

const app = graph.compile();
  1. Run the workflow and persist the result

    The compiled graph returns the final state. Persist both the extracted payload and the full trace metadata so you can answer regulator questions later about what was extracted, by which model version, and why it was routed to review.

async function main() {
  const result = await app.invoke({
    documentText:
      "Client Name: Jane Doe\nAccount Number: WM-10293\nTax Residency: SG\nBeneficial Owner: Jane Doe\nKYC Status: complete",
    validationErrors: [],
    needsReview: false,
    extracted: undefined,
  });

  console.log(JSON.stringify(result, null, 2));
}

main().catch(console.error);

Production Considerations

  • Data residency

    • Keep EU/UK/APAC client data in-region if your firm has residency commitments.
    • Route documents to region-specific models or private deployments rather than sending everything to one global endpoint.
  • Auditability

    • Log input hash, output hash, node transitions in the graph, prompt version, model name, and reviewer overrides.
    • For wealth management cases involving suitability or onboarding decisions, you need a clear chain of custody.
  • Guardrails


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides