How to Build a document extraction Agent Using CrewAI in TypeScript for insurance
A document extraction agent for insurance takes inbound files like claims forms, ACORD packets, medical bills, police reports, and policy schedules, then turns them into structured data your downstream systems can trust. That matters because insurance ops is still full of manual rekeying, and every missed field means slower claims handling, bad reserving, or compliance risk.
Architecture
- •
Document intake layer
- •Pulls PDFs, scans, email attachments, or S3 objects.
- •Normalizes file metadata like policy number, claim number, source channel, and received timestamp.
- •
OCR and text extraction
- •Converts scanned pages into text before the LLM sees them.
- •Keeps page-level references so extracted fields can be traced back to source evidence.
- •
Extraction agent
- •Uses CrewAI
Agentto identify insurance fields such as insured name, loss date, coverage type, claimant details, invoice totals, and reserve-impacting facts. - •Produces structured JSON only.
- •Uses CrewAI
- •
Validation and rules layer
- •Checks required fields, date formats, currency values, and policy-specific constraints.
- •Flags missing evidence or contradictory values for human review.
- •
Human review queue
- •Routes low-confidence or high-risk extractions to an adjuster or ops analyst.
- •Keeps an audit trail of edits and approvals.
- •
Persistence and audit store
- •Saves raw document text, extracted JSON, confidence scores, model version, prompt version, and reviewer actions.
- •Supports retention and data residency requirements.
Implementation
1) Install CrewAI for TypeScript and define your schema
You want a strict output shape from day one. In insurance workflows, free-form summaries are useless unless they map cleanly into claims or policy admin systems.
npm install @crewai/crewai zod
import { z } from "zod";
export const InsuranceExtractionSchema = z.object({
documentType: z.enum(["claim_form", "invoice", "medical_bill", "police_report", "policy_declaration"]),
insuredName: z.string().optional(),
claimantName: z.string().optional(),
policyNumber: z.string().optional(),
claimNumber: z.string().optional(),
lossDate: z.string().optional(), // ISO date string
totalAmount: z.number().optional(),
currency: z.string().default("USD"),
coverageType: z.string().optional(),
keyFacts: z.array(z.string()).default([]),
missingFields: z.array(z.string()).default([]),
});
2) Create the extraction agent with a strict role
The agent should behave like an insurance operations analyst who only extracts what is visible in the document. Don’t let it infer missing values.
import { Agent } from "@crewai/crewai";
export const extractionAgent = new Agent({
name: "Insurance Document Extractor",
role: "Extract structured insurance fields from documents with evidence-based precision.",
goal:
"Return validated JSON containing only fields supported by the source document.",
backstory:
"You work in claims operations. You never guess values. You preserve traceability for audit and compliance.",
});
3) Build the task and run a crew
Use a single extraction task first. Add a second validation task later if you need cross-checking against policy data or claim intake rules.
import { Task } from "@crewai/crewai";
import { Crew } from "@crewai/crewai";
import { InsuranceExtractionSchema } from "./schema";
import { extractionAgent } from "./agent";
const documentText = `
ACORD CLAIM FORM
Insured Name: Jordan Mitchell
Policy Number: P-88412091
Claim Number: CLM-55219
Loss Date: 2025-01-14
Total Amount Requested: $12,450.00
Coverage Type: Commercial Property
`;
const extractionTask = new Task({
description: `
Extract insurance fields from the document text.
Rules:
- Do not infer missing values.
- Preserve exact names and numbers where present.
- Return ISO date format for lossDate.
- Return only valid JSON matching the schema.
Document:
${documentText}
`,
expectedOutput:
"A JSON object with documentType, insuredName, policyNumber, claimNumber, lossDate, totalAmount, currency, coverageType, keyFacts, and missingFields.",
});
const crew = new Crew({
agents: [extractionAgent],
tasks: [extractionTask],
});
async function main() {
const result = await crew.kickoff();
// Validate after the model responds
const parsed = InsuranceExtractionSchema.parse(JSON.parse(String(result)));
console.log(parsed);
}
main().catch(console.error);
4) Add guardrails before persisting results
For insurance workloads you need deterministic checks around dates, money amounts, and required identifiers. Put these checks outside the model so they’re testable.
type ExtractionResult = {
policyNumber?: string;
claimNumber?: string;
lossDate?: string;
};
function validateExtraction(result: ExtractionResult) {
const errors: string[] = [];
if (!result.policyNumber) errors.push("policyNumber is required");
if (!result.claimNumber) errors.push("claimNumber is required");
if (result.lossDate && Number.isNaN(Date.parse(result.lossDate))) {
errors.push("lossDate must be a valid ISO date");
}
return errors;
}
const errors = validateExtraction({
policyNumber: parsed.policyNumber,
claimNumber: parsed.claimNumber,
lossDate: parsed.lossDate,
});
if (errors.length > 0) {
console.log({ routeToHumanReview: true, errors });
}
Production Considerations
- •
Deploy in-region
- •Keep OCR output, prompts, model calls, and storage in the same jurisdiction when handling regulated claims data.
- •If your carrier requires US-only processing or EU-only processing for certain lines of business, enforce that at the infrastructure layer.
- •
Log for audit without leaking PHI/PII
Oops? Wait no; need continue properly.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit