How to Build a document extraction Agent Using AutoGen in TypeScript for payments
A document extraction agent for payments reads invoices, remittance advices, bank statements, purchase orders, and payment confirmations, then turns them into structured fields your downstream systems can trust. For payments teams, this matters because manual extraction is slow, error-prone, and expensive; one bad invoice number or beneficiary account can create reconciliation breaks, duplicate payouts, or compliance issues.
Architecture
- •
Document intake layer
- •Accepts PDFs, images, and text from S3, blob storage, email ingestion, or an internal API.
- •Normalizes files before the agent sees them.
- •
OCR / text extraction layer
- •Converts scanned documents into text.
- •For production payments workflows, keep OCR outside the LLM when possible so you can audit the raw extracted text.
- •
AutoGen extraction agent
- •Uses a
AssistantAgentto extract structured payment fields from the normalized text. - •Returns JSON aligned to a strict schema.
- •Uses a
- •
Validation and policy layer
- •Checks required fields like invoice number, amount, currency, supplier name, IBAN/account number, and due date.
- •Rejects low-confidence or malformed outputs before posting to ERP or payment rails.
- •
Human review queue
- •Handles exceptions such as missing tax IDs, ambiguous totals, or mismatched beneficiary details.
- •Critical for compliance and segregation of duties.
- •
Audit and storage layer
- •Persists input document hash, extracted text hash, model output, validation result, reviewer action, and timestamps.
- •Needed for traceability in payments operations.
Implementation
1) Install dependencies and define the schema
Use AutoGen’s TypeScript packages plus a runtime validator. In payments workflows I prefer strict schemas because “best effort” extraction is how reconciliation problems start.
npm install @autogen-agentchat/agents @autogen-core/openai-client zod
Define the shape of what you want back from the agent:
import { z } from "zod";
export const PaymentDocumentSchema = z.object({
documentType: z.enum(["invoice", "remittance_advice", "bank_statement", "purchase_order"]),
supplierName: z.string(),
invoiceNumber: z.string().optional(),
currency: z.string().length(3),
totalAmount: z.number(),
dueDate: z.string().optional(),
ibanOrAccount: z.string().optional(),
reference: z.string().optional(),
});
export type PaymentDocument = z.infer<typeof PaymentDocumentSchema>;
2) Create an AutoGen assistant for extraction
AutoGen’s AssistantAgent is enough here. Keep the system message narrow so the model extracts fields instead of explaining them.
import { AssistantAgent } from "@autogen-agentchat/agents";
import { OpenAIChatCompletionClient } from "@autogen-core/openai-client";
const modelClient = new OpenAIChatCompletionClient({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY!,
});
export const extractor = new AssistantAgent({
name: "payment_document_extractor",
modelClient,
systemMessage: `
You extract structured payment data from documents.
Return only valid JSON matching this schema:
{
"documentType": "invoice|remittance_advice|bank_statement|purchase_order",
"supplierName": string,
"invoiceNumber"?: string,
"currency": string,
"totalAmount": number,
"dueDate"?: string,
"ibanOrAccount"?: string,
"reference"?: string
}
If a field is missing, omit it. Do not invent values.
`,
});
3) Run extraction and validate output
The important part is not just calling the agent; it is validating the response before it enters your payment workflow. In production I treat the LLM output as untrusted input.
import { PaymentDocumentSchema } from "./schema";
import { extractor } from "./agent";
export async function extractPaymentDocument(rawText: string) {
const result = await extractor.run([
{
role: "user",
content: `Extract payment fields from this document:\n\n${rawText}`,
},
]);
const lastMessage = result.messages[result.messages.length - 1];
if (!lastMessage || typeof lastMessage.content !== "string") {
throw new Error("Agent did not return text content");
}
const parsedJson = JSON.parse(lastMessage.content);
const validated = PaymentDocumentSchema.parse(parsedJson);
if (validated.totalAmount <= 0) {
throw new Error("Invalid amount");
}
return validated;
}
That pattern gives you three control points:
- •AutoGen handles reasoning over messy document text.
- •Zod enforces a strict contract.
- •Your application decides whether to auto-post or send to review.
4) Add a payment-specific decision gate
Don’t auto-release every extracted record. Route exceptions based on risk signals like missing beneficiary data or currency mismatches.
import { extractPaymentDocument } from "./extractor";
export async function processDocument(rawText: string) {
const doc = await extractPaymentDocument(rawText);
if (!doc.invoiceNumber || !doc.ibanOrAccount) {
return {
status: "manual_review",
reason: "Missing invoice number or beneficiary account",
document: doc,
};
}
if (doc.currency !== "USD" && doc.currency !== "EUR") {
return {
status: "manual_review",
reason: `Unsupported currency ${doc.currency}`,
document: doc,
};
}
return {
status: "approved_for_reconciliation",
document: doc,
};
}
In real systems this decision gate usually sits before ERP posting or payment initiation. That keeps your agent useful without letting it become a single point of failure.
Production Considerations
- •
Keep raw documents and extracted output in separate stores
- •Store original files in your compliant object store and derived JSON in a controlled application database.
- •This helps with audit trails and data residency requirements.
- •
Log every transformation step
- •Persist OCR text hash, prompt version, model version, output JSON, validation errors, reviewer action.
- •Payments auditors will ask how a specific field was produced months later.
- •
Set hard guardrails on downstream actions
- •Never let the agent directly trigger settlement for high-value payments.
- •Require validation rules plus human approval for threshold breaches, new beneficiaries, or altered bank details.
- •
Monitor extraction quality by document type
- •Track field-level accuracy for invoices vs remittances vs statements.
For payments teams this matters because failure modes differ by document class. A remittance advice missing one reference number can break matching; an invoice with a wrong IBAN can cause loss events.
Common Pitfalls
- •Treating LLM output as final truth
Avoid this by parsing and validating every response with a schema like Zod. If parsing fails, route to review instead of retrying blindly.
- •Using one prompt for every document type
Invoices and bank statements have different structures. Either classify first or use separate prompts/system messages per document type so you get better field precision.
- •Skipping audit metadata
If you don’t store source hashes, model version, prompt version, and reviewer actions, you won’t be able to explain extraction decisions during audits or disputes.
Payments automation fails when it optimizes for convenience over control. Build the agent so it extracts cleanly, validates aggressively, and escalates anything that could impact compliance, reconciliation, or funds movement.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit