How to Build a transaction monitoring Agent Using AutoGen in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21

transaction-monitoringautogenpythonretail-banking

A transaction monitoring agent watches payment and transfer activity, scores it against risk rules and behavioral patterns, and escalates suspicious cases for review. In retail banking, that matters because you need to catch fraud, mule activity, structuring, and account takeover early without flooding analysts with false positives.

Architecture

A production transaction monitoring agent for retail banking usually needs these pieces:

•
Transaction ingestion layer
- •Pulls card payments, ACH, wire transfers, internal transfers, and account events from Kafka, a queue, or a batch feed.
•
Risk scoring service
- •Applies deterministic rules first: velocity checks, amount thresholds, geography mismatches, beneficiary changes, and device anomalies.
•
AutoGen multi-agent workflow
- •Uses specialized agents for triage, investigation summary, and compliance classification.
•
Case management adapter
- •Writes alerts to your case system with evidence, scores, and explanation.
•
Audit and evidence store
- •Persists every prompt, model response, rule hit, and analyst override for regulatory review.
•
Policy and guardrail layer
- •Redacts PII where possible, enforces data residency constraints, and blocks unsupported actions.

Implementation

1) Set up the AutoGen agents

Use autogen.AssistantAgent for the triage logic and autogen.UserProxyAgent as the orchestrator that executes the workflow. For retail banking, keep the model-facing prompt narrow: it should classify risk and explain why, not make autonomous account decisions.

import os
import json
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
    "temperature": 0,
}

transaction_monitor = AssistantAgent(
    name="transaction_monitor",
    llm_config=llm_config,
    system_message=(
        "You are a retail banking transaction monitoring analyst. "
        "Classify transactions as LOW_RISK, REVIEW, or ESCALATE. "
        "Use only the provided facts. Do not invent missing data. "
        "Return concise reasoning suitable for audit."
    ),
)

orchestrator = UserProxyAgent(
    name="orchestrator",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
)

2) Add deterministic pre-screening before the LLM

In banking, rules should do the first pass. That keeps costs down and gives you explainable triggers before any model call.

def rule_screen(txn: dict) -> dict:
    hits = []

    if txn["amount"] >= 10000:
        hits.append("high_amount")

    if txn.get("velocity_24h", 0) >= 5:
        hits.append("high_velocity")

    if txn.get("country") not in {"US", "CA", "GB"}:
        hits.append("cross_border")

    if txn.get("new_beneficiary", False):
        hits.append("new_beneficiary")

    score = min(100, len(hits) * 25)
    return {"score": score, "hits": hits}

txn = {
    "transaction_id": "TXN-88421",
    "customer_id": "CUST-1029",
    "amount": 14500,
    "country": "NG",
    "velocity_24h": 6,
    "new_beneficiary": True,
}

screen = rule_screen(txn)
print(screen)

3) Pass only the necessary context into AutoGen

Do not send raw customer profiles or full statements unless you have a clear legal basis and residency controls. Send a compact case packet with masked identifiers and rule hits.

case_packet = {
    "transaction_id": txn["transaction_id"],
    "customer_id": txn["customer_id"],
    "amount": txn["amount"],
    "country": txn["country"],
    "rule_score": screen["score"],
    "rule_hits": screen["hits"],
    "masked_account": "***4821",
    "recent_pattern": {
        "velocity_24h": txn["velocity_24h"],
        "new_beneficiary": txn["new_beneficiary"],
    },
}

prompt = f"""
Review this retail banking transaction case.

Case:
{json.dumps(case_packet, indent=2)}

Output JSON with:
- disposition: LOW_RISK | REVIEW | ESCALATE
- reason: short explanation
- next_action: what an analyst should do next
"""

result = orchestrator.initiate_chat(
    transaction_monitor,
    message=prompt,
)

print(result.summary)

4) Parse the response and write it to your case system

Treat the agent output as advisory. The final decision stays with policy engines or human analysts depending on your operating model.

def normalize_agent_output(text: str) -> dict:
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        return {
            "disposition": "REVIEW",
            "reason": text[:500],
            "next_action": "Manual review required"
        }

agent_output = normalize_agent_output(result.summary)

case_record = {
    **case_packet,
    **agent_output,
}

print(json.dumps(case_record, indent=2))

Production Considerations

•
Keep decision authority outside the model
- •In retail banking, the LLM should recommend; your rules engine or analyst workflow should decide. This is important for auditability under AML/KYC controls.
•
Log every artifact
- •Persist inputs, prompts, outputs, timestamps, rule hits, model version, and analyst overrides. You need this for investigations and regulator requests.
•
Enforce residency and access controls
- •If customer data must stay in-region, pin inference to approved infrastructure and avoid sending sensitive fields to external endpoints without approval.
•
Add hard guardrails
- •Block instructions that request account closure, fund movement, or customer outreach without downstream approval. The agent should never take irreversible action on its own.

Common Pitfalls

•
Sending too much raw PII to the model
- •Avoid passing full names, addresses at field level unless required. Mask identifiers and use tokenized case packets instead.
•
Using the LLM as the first-line detector
- •Don’t ask AutoGen to find fraud from scratch on raw feeds. Use deterministic rules or anomaly models first; let AutoGen explain and triage.
•
No audit trail for model decisions
- •If you can’t reconstruct why a case was escalated six months later, you’re not ready for production. Store prompts, responses, versions, and evidence hashes.
•
Letting prompts drift into open-ended analysis
- •Keep output schema fixed: disposition, reason, next_action. Loose prompts create inconsistent case notes and poor analyst trust.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit