How to Build a document extraction Agent Using AutoGen in TypeScript for retail banking
A document extraction agent for retail banking takes incoming PDFs, scans, images, and email attachments, then extracts structured fields like customer name, account number, income, address, statement dates, and transaction totals. It matters because onboarding, loan origination, dispute handling, and KYC teams still waste time rekeying data from unstructured documents into core banking systems.
Architecture
- •
Ingestion layer
- •Accepts PDFs, TIFFs, JPEGs, and email attachments from secure upload endpoints or internal queues.
- •Normalizes files before they reach the agent.
- •
Document classification step
- •Detects document type: bank statement, payslip, utility bill, ID card, tax form.
- •Routes to the right extraction schema.
- •
Extraction agent
- •Uses AutoGen
AssistantAgentto turn OCR text into structured JSON. - •Enforces a strict output schema for downstream systems.
- •Uses AutoGen
- •
Validation layer
- •Checks required fields, formats, confidence thresholds, and cross-field consistency.
- •Flags low-confidence records for human review.
- •
Audit and persistence
- •Stores extracted fields, source document hash, model prompt version, and validation decisions.
- •Supports regulatory audit trails and replay.
- •
Human-in-the-loop review queue
- •Handles exceptions for ambiguous or incomplete documents.
- •Keeps a reviewer in the loop for compliance-sensitive cases.
Implementation
1. Install AutoGen and define the extraction contract
For TypeScript in Node.js, use the AutoGen package that exposes AssistantAgent, UserProxyAgent, and OpenAIChatCompletionClient. The key pattern is simple: keep the agent focused on extraction only, then validate its JSON output outside the model.
npm install @autogenai/autogen openai zod
import { AssistantAgent } from "@autogenai/autogen";
import { OpenAIChatCompletionClient } from "@autogenai/autogen-ext/models/openai";
import { z } from "zod";
const StatementSchema = z.object({
customerName: z.string(),
accountNumber: z.string(),
statementPeriodStart: z.string(),
statementPeriodEnd: z.string(),
openingBalance: z.number(),
closingBalance: z.number(),
currency: z.string().length(3),
});
type StatementExtraction = z.infer<typeof StatementSchema>;
const client = new OpenAIChatCompletionClient({
model: "gpt-4o-mini",
});
const extractor = new AssistantAgent({
name: "bank_document_extractor",
modelClient: client,
systemMessage:
"Extract only the requested banking fields from OCR text. Return valid JSON only.",
});
2. Feed OCR text into AutoGen and force structured output
In retail banking, you usually do OCR first with a separate service like Azure Document Intelligence or Textract. AutoGen should receive clean text plus document metadata; don’t make the LLM do image parsing unless you have to.
async function extractStatement(ocrText: string): Promise<StatementExtraction> {
const result = await extractor.run([
{
role: "user",
content: `
Extract these fields from the document text below:
- customerName
- accountNumber
- statementPeriodStart
- statementPeriodEnd
- openingBalance
- closingBalance
- currency
Rules:
- Return JSON only.
- Use ISO dates (YYYY-MM-DD).
- Use numbers for balances.
- If a field is missing, set it to null.
OCR TEXT:
${ocrText}
`.trim(),
},
]);
const raw = result.messages[result.messages.length - 1]?.content ?? "";
const parsed = JSON.parse(raw);
return StatementSchema.parse(parsed);
}
3. Add a validation pass before persistence
This is where banking-specific controls belong. Validate format, compare balances where possible, and reject outputs that don’t meet minimum confidence or schema rules.
function validateExtraction(data: StatementExtraction) {
if (!/^\d{8,20}$/.test(data.accountNumber)) {
throw new Error("Invalid account number format");
}
if (data.openingBalance < 0 || data.closingBalance < 0) {
throw new Error("Balances cannot be negative");
}
if (!["USD", "EUR", "GBP", "ZAR"].includes(data.currency)) {
throw new Error("Unsupported currency");
}
}
4. Wire it into a controlled workflow with UserProxyAgent
Use UserProxyAgent when you want deterministic orchestration around tool calls or manual approval steps. For retail banking document processing, that usually means routing failed validations into a review queue instead of auto-posting results.
import { UserProxyAgent } from "@autogenai/autogen";
const reviewer = new UserProxyAgent({
name: "bank_ops_reviewer",
});
async function processDocument(ocrText: string) {
const extracted = await extractStatement(ocrText);
validateExtraction(extracted);
return {
status: "approved",
data: extracted,
audit: {
model: "gpt-4o-mini",
agent: "bank_document_extractor",
timestamp: new Date().toISOString(),
},
};
}
async function routeForReview(reason: string) {
return reviewer.run([
{
role: "user",
content: `Review required for retail banking document extraction. Reason: ${reason}`,
},
]);
}
Production Considerations
- •
Data residency
- •Keep OCR text and extracted fields inside approved regions.
- •Pin model endpoints to region-specific deployments where possible.
- •
Auditability
- •Log document hash, prompt version, model version, validation errors, and reviewer actions.
- •Store immutable records for regulatory review and dispute resolution.
- •
Guardrails
- •Never let the agent directly write to core banking systems.
- •Require schema validation plus business-rule checks before any downstream action.
- •
Monitoring
- •Track extraction accuracy by document type.
- •Bank statements often fail on tables.
- •Utility bills often fail on address parsing.
- •Payslips often fail on gross/net salary detection.
- •Alert on drift when field-level confidence drops or reviewer overrides spike.
- •Track extraction accuracy by document type.
Common Pitfalls
- •
Letting the LLM parse raw images directly
- •Don’t rely on multimodal inference as your primary extraction path unless you’ve tested it heavily.
- •Use OCR first so your pipeline is deterministic and auditable.
- •
Skipping schema validation
- •Model output will occasionally be malformed JSON or contain missing fields.
- •Always parse with something like Zod before persisting anything.
- •
Treating all documents the same
- •A bank statement is not a payslip is not an ID card.
- •Classify first, then use doc-specific prompts and schemas so your extraction stays accurate enough for production use.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit