How to Build a document extraction Agent Using AutoGen in Python for payments

By Cyprian AaronsUpdated 2026-04-21

document-extractionautogenpythonpayments

A document extraction agent for payments reads invoices, remittance advice, bank statements, and payment instructions, then turns them into structured fields your system can trust. That matters because a bad extraction can trigger a duplicate payout, a rejected transfer, or a compliance issue that takes hours to unwind.

Architecture

•
Document ingress
- •Accept PDFs, images, email attachments, or scanned files from a controlled upload service.
- •Store the raw document in an immutable location for audit and replay.
•
OCR / text normalization
- •Convert scanned pages into text before the LLM sees them.
- •Keep page numbers and bounding-box metadata if you need traceability back to source.
•
Extraction agent
- •Use an autogen.AssistantAgent to extract payment-relevant fields like invoice number, amount, currency, beneficiary name, IBAN, routing number, due date, and remittance references.
•
Validation agent
- •Use a second agent to check schema validity, payment rules, and cross-field consistency.
- •This is where you catch mismatched totals, invalid account formats, or missing tax identifiers.
•
Human review queue
- •Route low-confidence or high-risk documents to an operator.
- •Payments needs deterministic escalation paths when confidence is below threshold.
•
Persistence and audit trail
- •Write extracted JSON plus the original source hash to your database.
- •Keep every agent response versioned for compliance and dispute handling.

Implementation

1) Install AutoGen and define the extraction schema

For payments work, do not let the model free-form its output. Force a structured response contract and validate it before anything reaches downstream payment rails.

pip install pyautogen pydantic

from pydantic import BaseModel, Field
from typing import Optional

class PaymentDocument(BaseModel):
    document_type: str = Field(..., description="invoice|remittance|bank_statement|payment_instruction")
    invoice_number: Optional[str] = None
    amount: float
    currency: str
    beneficiary_name: Optional[str] = None
    iban: Optional[str] = None
    swift_bic: Optional[str] = None
    routing_number: Optional[str] = None
    due_date: Optional[str] = None
    remittance_reference: Optional[str] = None

2) Create the AutoGen agents

Use one agent to extract and another to verify. In AutoGen terms, AssistantAgent does the reasoning work; UserProxyAgent is useful as the orchestrator that initiates chat and can execute code if needed.

import autogen

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": "YOUR_OPENAI_API_KEY",
}

extractor = autogen.AssistantAgent(
    name="extractor",
    llm_config=llm_config,
    system_message=(
        "You extract structured payment data from documents. "
        "Return only valid JSON matching the provided schema."
    ),
)

validator = autogen.AssistantAgent(
    name="validator",
    llm_config=llm_config,
    system_message=(
        "You validate payment extraction results. "
        "Check field completeness, formatting, and consistency."
    ),
)

user_proxy = autogen.UserProxyAgent(
    name="orchestrator",
    human_input_mode="NEVER",
)

3) Run extraction and validation with a real chat pattern

The simplest production pattern is: send OCR text to the extractor, parse JSON locally with Python, then send the parsed object to the validator for rule checks. If validation fails, route to human review instead of auto-posting into payments.

import json
from pydantic import ValidationError

ocr_text = """
Invoice No: INV-2024-8831
Amount Due: USD 12,450.00
Beneficiary: Acme Logistics Ltd
IBAN: GB29NWBK60161331926819
SWIFT/BIC: NWBKGB2L
Due Date: 2024-11-30
Reference: PAY-77821
"""

extract_prompt = f"""
Extract payment data from this document text.

Return JSON with these keys:
document_type, invoice_number, amount, currency,
beneficiary_name, iban, swift_bic, routing_number,
due_date, remittance_reference

Document:
{ocr_text}
"""

chat_result = user_proxy.initiate_chat(
    recipient=extractor,
    message=extract_prompt,
)

raw_output = chat_result.chat_history[-1]["content"]
payload = json.loads(raw_output)
doc = PaymentDocument(**payload)

validation_prompt = f"""
Validate this extracted payment JSON for banking/payment use:
{doc.model_dump_json(indent=2)}

Return:
- PASS if consistent and complete enough for processing
- FAIL with specific reasons otherwise
"""

validation_result = user_proxy.initiate_chat(
    recipient=validator,
    message=validation_prompt,
)

print("EXTRACTED:", doc.model_dump())
print("VALIDATION:", validation_result.chat_history[-1]["content"])

4) Add hard stops for payments risk

The model should never be the final authority on whether money moves. Wrap extraction with deterministic checks for currency format, IBAN checksum libraries if applicable in your stack, duplicate invoice detection, and sanctions screening before any payout event is created.

def should_route_to_human(doc: PaymentDocument) -> bool:
    required_fields = [doc.amount, doc.currency]
    if any(v is None for v in required_fields):
        return True

    if doc.amount <= 0:
        return True

    if doc.document_type not in {"invoice", "remittance", "bank_statement", "payment_instruction"}:
        return True

    return False

if should_route_to_human(doc):
    print("ROUTE_TO_HUMAN_REVIEW")
else:
    print("READY_FOR_DOWNSTREAM_CHECKS")

Production Considerations

•
Data residency
- •Keep OCR text and extracted payloads in-region if you process regulated payment data across jurisdictions.
- •If your policy forbids cross-border processing of bank details or invoices containing personal data, pin model endpoints and storage accordingly.
•
Auditability
- •Persist the original document hash, OCR output version, prompt version, model version, and final JSON.
- •When finance or compliance asks why a payout was released or blocked, you need a full chain of evidence.
•
Monitoring
- •Track extraction accuracy by field type: invoice number accuracy is not the same as IBAN accuracy.
- •Alert on spikes in human-review rate; that usually means OCR quality degraded or document templates changed.
•
Guardrails
- •Enforce allowlisted document types and reject anything outside payments scope.
- •Never let extracted data trigger automatic settlement without deterministic validation plus policy checks.

Common Pitfalls

•
Letting the model output free-form text
- •This breaks downstream parsing fast.
- •Fix it by forcing JSON-only output and validating with Pydantic before any business action.
•
Skipping human review thresholds
- •Low-confidence extractions will eventually hit a high-value payment.
- •Fix it by routing ambiguous documents to an operator when required fields are missing or inconsistent.
•
Ignoring compliance boundaries
- •Payments data often includes account numbers, tax IDs, names, addresses, and sometimes personal data.
- •Fix it by defining retention rules, access controls, region-specific storage policies, and an audit log from day one.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit