How to Build a fraud detection Agent Using AutoGen in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionautogenpythonretail-banking

A fraud detection agent in retail banking watches transaction streams, customer context, and policy rules, then decides whether a payment should be approved, held for review, or escalated. It matters because the difference between a good and bad decision is not just money lost — it’s customer trust, regulatory exposure, and operational load on your fraud team.

Architecture

  • Transaction intake layer

    • Receives card payments, ACH transfers, wire requests, login events, and beneficiary changes.
    • Normalizes raw payloads into a consistent fraud-case schema.
  • Risk feature builder

    • Enriches each event with velocity checks, device fingerprinting, geo distance, account age, beneficiary history, and prior disputes.
    • Pulls from internal services only; keep PII handling explicit.
  • AutoGen decision agent

    • Uses AssistantAgent to reason over the transaction context and produce a structured recommendation.
    • Calls tools for policy lookup, case history retrieval, and threshold checks.
  • Policy enforcement layer

    • Applies hard rules outside the LLM: sanctions hits, KYC status, resident-country restrictions, and amount thresholds.
    • The agent advises; policy code decides.
  • Audit logger

    • Stores prompts, tool outputs, model response, final action, and reason codes.
    • Needed for model governance and post-incident review.
  • Case management output

    • Sends approve, step_up_auth, hold_for_review, or block to your fraud ops queue or core banking workflow.

Implementation

  1. Install AutoGen and define the transaction schema

Use the current AutoGen package and keep the case object explicit. In retail banking, vague prompts lead to vague decisions.

pip install pyautogen pydantic
from pydantic import BaseModel
from typing import Literal

class FraudCase(BaseModel):
    transaction_id: str
    customer_id: str
    amount: float
    currency: str
    merchant_category: str
    country: str
    device_trust_score: float
    velocity_1h: int
    account_age_days: int
    prior_disputes_90d: int
    kyc_status: Literal["verified", "pending", "failed"]
  1. Create tools for policy checks and case history

Keep deterministic checks in Python functions. The agent can call them through AutoGen’s tool-calling path via register_function.

import json
from autogen import AssistantAgent

def policy_check(case: dict) -> dict:
    reasons = []
    if case["kyc_status"] != "verified":
        reasons.append("KYC_NOT_VERIFIED")
    if case["amount"] >= 5000:
        reasons.append("HIGH_VALUE")
    if case["velocity_1h"] >= 5:
        reasons.append("VELOCITY_SPIKE")
    if case["device_trust_score"] < 0.35:
        reasons.append("LOW_DEVICE_TRUST")
    action = "approve" if not reasons else "hold_for_review"
    return {"action": action, "reasons": reasons}

def fetch_case_history(customer_id: str) -> dict:
    # Replace with real data access layer; keep residency constraints in mind.
    return {
        "customer_id": customer_id,
        "chargebacks_12m": 1,
        "alerts_30d": 2,
        "recent_beneficiary_additions": 0,
    }
  1. Wire up an AssistantAgent that produces structured fraud decisions

The pattern below uses one assistant agent plus registered functions. This is enough for a production-style first pass when you want explainable recommendations without building a multi-agent swarm on day one.

from autogen import AssistantAgent

llm_config = {
    "model": "gpt-4o-mini",
    "temperature": 0,
}

fraud_agent = AssistantAgent(
    name="fraud_decision_agent",
    llm_config=llm_config,
    system_message=(
        "You are a fraud analyst for retail banking. "
        "Use only the provided transaction facts and tool outputs. "
        "Return JSON with keys: action, confidence, reason_codes, analyst_note. "
        "Actions allowed: approve, step_up_auth, hold_for_review, block."
    ),
)

fraud_agent.register_function(
    function_map={
        "policy_check": policy_check,
        "fetch_case_history": fetch_case_history,
    }
)

case = FraudCase(
    transaction_id="tx_12345",
    customer_id="cus_7788",
    amount=7400.00,
    currency="USD",
    merchant_category="electronics",
    country="US",
    device_trust_score=0.22,
    velocity_1h=7,
    account_age_days=14,
    prior_disputes_90d=1,
    kyc_status="verified",
)

prompt = f"""
Analyze this retail banking transaction:

{case.model_dump_json(indent=2)}

First call fetch_case_history(customer_id), then call policy_check(case).
Then provide a final JSON decision based on all evidence.
"""

result = fraud_agent.generate_reply(messages=[{"role": "user", "content": prompt}])
print(result)
  1. Enforce the final decision outside the model

The LLM should not directly move money or freeze accounts without deterministic approval logic. Use the model output as a recommendation layer and map it to your bank’s workflow engine after validation.

import json

def validate_decision(raw_response: str) -> dict:
    decision = json.loads(raw_response)
    allowed = {"approve", "step_up_auth", "hold_for_review", "block"}
    
    if decision["action"] not in allowed:
        raise ValueError("Invalid action from model")
    
    if not (0.0 <= float(decision["confidence"]) <= 1.0):
        raise ValueError("Invalid confidence score")
    
    return decision

# Example downstream usage:
# decision = validate_decision(result)
# route_to_case_queue(decision)

Production Considerations

  • Keep PII out of prompts where possible

    • Tokenize account numbers, names, and addresses before sending context to the agent.
    • If you need reversibility for investigations, store mappings in a controlled vault.
  • Respect data residency

    • Run inference in-region when customer data cannot leave jurisdiction.
    • Make sure tool calls do not silently hit cross-border logs or third-party APIs.
  • Log everything needed for audit

    • Store prompt version, tool inputs/outputs, model version, final action, timestamp, analyst override status.
    • This is non-negotiable for fraud governance and regulator reviews.
  • Use hard guardrails before model output reaches operations

    • Sanctions screening failures, confirmed mule indicators, or failed KYC should bypass the agent.
    • The agent can recommend; it should not override mandatory controls.

Common Pitfalls

  • Letting the LLM make irreversible decisions alone

    • Don’t let AssistantAgent directly freeze cards or block accounts.
    • Put a deterministic policy layer between model output and execution.
  • Passing raw customer data into prompts

    • Developers often dump full profiles into messages because it’s convenient.
    • Mask sensitive fields and only send features needed for the decision.
  • Skipping replayable audit trails

    • If you can’t reconstruct why a transaction was flagged six months later, you have an operational problem.
    • Version prompts, tools, thresholds, and model config together so every decision is reproducible.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides