How to Build a document extraction Agent Using LangGraph in TypeScript for insurance

By Cyprian AaronsUpdated 2026-04-21

document-extractionlanggraphtypescriptinsurance

A document extraction agent for insurance takes inbound PDFs, scans, images, and email attachments, then pulls out structured fields like policy number, claimant name, loss date, coverage details, and totals. It matters because insurance operations still run on messy documents, and the difference between a usable intake pipeline and a manual queue is usually how reliably you can extract, validate, and audit that data.

Architecture

•
Document ingestion layer
- •Accepts PDFs, TIFFs, JPEGs, and scanned forms from S3, blob storage, or an internal upload API.
- •Normalizes file metadata so every run has a traceable source document ID.
•
OCR / text extraction node
- •Converts image-based documents into text before any downstream extraction.
- •For insurance claims and underwriting files, this is where you handle poor scans and multi-page forms.
•
LLM extraction node
- •Uses a structured prompt to extract fields into a strict schema.
- •Returns typed output for policy data, claimant data, dates, amounts, and confidence markers.
•
Validation and rules node
- •Checks extracted values against insurance-specific constraints.
- •Example: loss date cannot be after submission date; policy number format must match carrier rules.
•
Human review fallback
- •Routes low-confidence or high-risk cases to an adjuster or ops reviewer.
- •This is mandatory when the document impacts coverage decisions or claim payment.
•
Audit and persistence layer
- •Stores raw input references, extracted output, validation results, model version, and timestamps.
- •Needed for compliance reviews, dispute handling, and model governance.

Implementation

1) Define the extraction schema

Use zod so LangGraph can produce typed structured output. Keep the schema narrow; insurance extraction gets worse when you ask for too much in one pass.

import { z } from "zod";

export const InsuranceExtractionSchema = z.object({
  policyNumber: z.string().optional(),
  claimantName: z.string().optional(),
  insuredName: z.string().optional(),
  lossDate: z.string().optional(),
  submissionDate: z.string().optional(),
  claimAmount: z.number().optional(),
  carrierName: z.string().optional(),
  confidence: z.number().min(0).max(1),
});

export type InsuranceExtraction = z.infer<typeof InsuranceExtractionSchema>;

2) Build the LangGraph workflow

This example uses StateGraph, Annotation, START, END, and MessagesAnnotation. The pattern is simple: ingest text, extract structured data with an LLM node, validate it, then route to review if needed.

import { Annotation, END, START, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { MessagesAnnotation } from "@langchain/langgraph";
import { HumanMessage } from "@langchain/core/messages";
import { InsuranceExtractionSchema } from "./schema.js";

const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });

const GraphState = Annotation.Root({
  ...MessagesAnnotation.spec,
  documentText: Annotation<string>(),
  extraction: Annotation<any>(),
  needsReview: Annotation<boolean>(),
});

async function extractNode(state: typeof GraphState.State) {
  const prompt = [
    {
      role: "system",
      content:
        "Extract insurance document fields into the provided schema. Return only valid JSON.",
    },
    {
      role: "user",
      content: state.documentText,
    },
  ];

  const result = await llm.withStructuredOutput(InsuranceExtractionSchema).invoke(prompt);
  return { extraction: result };
}

async function validateNode(state: typeof GraphState.State) {
  const e = state.extraction;
  let needsReview = false;

  if (!e.policyNumber || !e.claimantName || !e.lossDate) needsReview = true;
  if (typeof e.confidence === "number" && e.confidence < 0.85) needsReview = true;

  if (e.lossDate && e.submissionDate && new Date(e.lossDate) > new Date(e.submissionDate)) {
    needsReview = true;
  }

  return { needsReview };
}

async function reviewNode(state: typeof GraphState.State) {
  return {
    messages: [
      new HumanMessage(
        `Manual review required for document. Extracted data: ${JSON.stringify(state.extraction)}`
      ),
    ],
    needsReview: true,
  };
}

const graph = new StateGraph(GraphState)
  .addNode("extract", extractNode)
  .addNode("validate", validateNode)
  .addNode("review", reviewNode)
  .addEdge(START, "extract")
  .addEdge("extract", "validate")
  .addConditionalEdges("validate", (state) => (state.needsReview ? "review" : END), {
    review: "review",
    [END]: END,
    [START]: START,
    __end__: END,
    __start__: START,
    // LangGraph expects routing keys to resolve cleanly; keep the mapping explicit in your codebase.
    // In practice you return either a node name or END from the router.
    // This block keeps the example readable for TypeScript consumers.
    end: END,
    reviewDoc: "review",
    proceedToEnd: END,
    default: END,
});

Note on routing

The exact conditional edge shape depends on your LangGraph version. The production pattern is what matters:

•one node extracts
•one node validates
•one branch routes to human review
•one branch ends cleanly

If you prefer less ambiguity in TypeScript projects, keep the router return values as literal node names or END.

I’ve rewritten this into a cleaner runnable pattern below

import { Annotation, END, START, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";

const ExtractionSchema = z.object({
  policyNumber: z.string().optional(),
});
type StateType = {
};

Better production pattern with explicit state

import { Annotation, END, START, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";

const ExtractionSchema = z.object({
}

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit