How to Build a document extraction Agent Using CrewAI in TypeScript for retail banking
A document extraction agent for retail banking reads incoming customer documents, identifies the document type, pulls out the fields you care about, and returns structured data your downstream systems can trust. That matters because onboarding, lending, dispute handling, and KYC all depend on getting accurate data out of PDFs, scans, and images without pushing every file through a manual ops queue.
Architecture
A production-grade retail banking extraction agent usually needs these components:
- •
Document intake layer
- •Accepts PDFs, images, and scanned documents from channels like branch upload, mobile app, or back-office inboxes.
- •Normalizes file metadata such as customer ID, case ID, jurisdiction, and source system.
- •
OCR / text acquisition step
- •Converts scanned documents into text before extraction.
- •For image-heavy forms, this is where quality issues like skew, blur, and low contrast get handled.
- •
CrewAI extraction crew
- •Uses a
Crewwith specializedAgents for classification and field extraction. - •One agent identifies the document type; another extracts fields into a strict schema.
- •Uses a
- •
Validation and policy guardrails
- •Checks extracted values against banking rules: mandatory fields, format constraints, date logic, and residency requirements.
- •Rejects or flags anything that fails confidence thresholds.
- •
Audit logging layer
- •Stores input hashes, model version, prompt version, extracted output, and reviewer overrides.
- •This is non-negotiable for compliance teams and internal audit.
- •
Downstream integration
- •Pushes validated output into KYC systems, CRM, LOS, or case management platforms.
- •Keeps human review in the loop when confidence is low or document quality is poor.
Implementation
1) Install dependencies and define your extraction schema
Use CrewAI from TypeScript through its JS runtime package. Keep your output contract strict; retail banking workflows break when extraction returns “almost correct” JSON.
npm install @crewaiinc/crewai zod dotenv
Define the target shape with zod so you can validate outputs before they hit core systems.
import { z } from "zod";
export const BankingDocumentSchema = z.object({
documentType: z.enum(["bank_statement", "utility_bill", "passport", "id_card", "pay_stub"]),
fullName: z.string(),
documentNumber: z.string().optional(),
issueDate: z.string().optional(),
expiryDate: z.string().optional(),
address: z.string().optional(),
accountNumber: z.string().optional(),
bankName: z.string().optional(),
currency: z.string().optional(),
});
export type BankingDocument = z.infer<typeof BankingDocumentSchema>;
2) Create agents with explicit responsibilities
Keep one agent focused on classification and another on extraction. That separation gives you better prompts, cleaner audits, and easier tuning later.
import "dotenv/config";
import { Agent } from "@crewaiinc/crewai";
export const classifierAgent = new Agent({
role: "Document Classifier",
goal: "Identify the banking document type with high precision.",
backstory:
"You work in retail banking operations. You classify incoming customer documents before any sensitive field extraction happens.",
});
export const extractorAgent = new Agent({
role: "Field Extraction Specialist",
goal: "Extract structured banking fields from the classified document text.",
backstory:
"You extract only verified fields needed for KYC and onboarding. You never invent values.",
});
3) Build tasks and run them in a crew
The pattern below assumes you already have OCR text from your ingestion layer. In production you should pass the OCR output plus metadata like jurisdiction and source channel into the task context.
import { Crew, Task } from "@crewaiinc/crewai";
import { BankingDocumentSchema } from "./schema";
import { classifierAgent, extractorAgent } from "./agents";
async function runExtraction(documentText: string) {
const classifyTask = new Task({
description: `
Classify this retail banking document.
Return only one of: bank_statement, utility_bill, passport, id_card, pay_stub.
Document text:
${documentText}
`,
expectedOutput: "A single valid document type.",
agent: classifierAgent,
asyncExecution: false,
});
const extractTask = new Task({
description: `
Extract fields from the document using this schema:
${BankingDocumentSchema.toString()}
Rules:
- Do not guess missing values
- Return ISO dates where possible
- If a field is absent or unreadable, omit it
- Use only evidence present in the text
`,
expectedOutput: "Structured JSON matching the schema.",
agent: extractorAgent,
context: [classifyTask],
asyncExecution: false,
});
const crew = new Crew({
agents: [classifierAgent, extractorAgent],
tasks: [classifyTask, extractTask],
verbose: true,
process: "sequential",
});
const result = await crew.kickoff();
return result;
}
4) Validate output before persistence
Never write raw model output directly to your operational store. Validate it first and route failures to manual review.
import { BankingDocumentSchema } from "./schema";
async function persistIfValid(rawOutput: unknown) {
const parsed = BankingDocumentSchema.safeParse(rawOutput);
if (!parsed.success) {
return {
status: "needs_review",
reasons: parsed.error.flatten(),
};
}
// Send parsed.data to LOS/KYC/CRM here.
return {
status: "accepted",
data: parsed.data,
};
}
A practical flow is:
- •OCR document.
- •Run CrewAI classification plus extraction.
- •Validate against schema and business rules.
- •Store audit record with hashes and model metadata.
- •Persist only accepted records to downstream systems.
Production Considerations
- •
Data residency
- •Keep OCR text and extracted payloads in-region if your retail bank operates under country-specific residency rules.
- •If your model provider crosses borders by default, block that path at the network layer.
- •
Auditability
- •Log
task.description, prompt template version, model name, timestamp, input hash, output hash, and human override reason. - •Regulators will ask how a field was derived; “the model said so” is not an answer.
- •Log
- •
Guardrails
- •Enforce confidence thresholds per document type.
- •Route passports with unreadable MRZ zones or statements missing account numbers to manual review instead of auto-accepting partial data.
- •
Monitoring
- •Track extraction accuracy by doc class, branch channel, scanner source, and geography.
- •Watch for drift when templates change or new statement layouts appear.
- •Alert on spikes in review rate or validation failures; those usually mean OCR degradation or prompt regression.
- •Track extraction accuracy by doc class, branch channel, scanner source, and geography.
Common Pitfalls
- •
Letting the model invent missing fields
- •This is the fastest way to contaminate KYC records.
- •Fix it by telling the extractor to omit unreadable values and validating with
zodbefore persistence.
- •
Using one generic agent for everything
- •A single agent that classifies and extracts tends to be noisier and harder to debug.
- •Split responsibilities into classifier and extractor agents so failures are isolated.
- •
Skipping human review for borderline documents
- •Retail banking has too much compliance risk to auto-approve low-confidence outputs.
- •Use a review queue for low-quality scans, mismatched names/addresses, expired IDs, or ambiguous statements.
- •
Ignoring prompt/version traceability
- •When an audit team asks why an address was extracted incorrectly six months ago, you need exact lineage.
- •Store prompt versions alongside extracted records so you can reproduce behavior later.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit