How to Build a fraud detection Agent Using AutoGen in Python for fintech

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionautogenpythonfintech

A fraud detection agent in fintech watches transactions, flags suspicious patterns, and escalates cases that need human review. The point is not to auto-decline everything that looks odd; it is to reduce manual review load, catch coordinated fraud earlier, and keep an audit trail that compliance can defend.

Architecture

  • Transaction intake layer

    • Receives payment events, account changes, device fingerprints, and velocity signals.
    • Normalizes payloads into a strict schema before the agent sees them.
  • Policy/rules engine

    • Applies hard business rules first: sanctioned geographies, blocked BIN ranges, impossible travel, rapid retries.
    • Prevents the LLM from being the first line of defense.
  • AutoGen analyst agent

    • Uses AssistantAgent to reason over structured evidence.
    • Produces a risk assessment, rationale, and recommended action.
  • Tooling layer

    • Exposes deterministic functions for feature lookup, case creation, customer history retrieval, and alert enrichment.
    • Keeps sensitive data access controlled and auditable.
  • Human review loop

    • Uses UserProxyAgent or a caseworker workflow for final approval on high-risk or ambiguous cases.
    • Important for compliance and model risk management.
  • Audit and storage layer

    • Persists inputs, outputs, tool calls, and final disposition.
    • Needed for disputes, regulator requests, and internal model governance.

Implementation

1) Install AutoGen and define your transaction schema

For fintech work, do not pass raw unstructured JSON straight into the agent. Validate it first so the model only sees fields you explicitly allow.

from dataclasses import dataclass
from typing import Literal

@dataclass
class TransactionEvent:
    transaction_id: str
    user_id: str
    amount: float
    currency: str
    country: str
    device_id: str
    ip_risk_score: int
    velocity_5m: int
    prior_chargebacks_90d: int
    merchant_category: str
    status: Literal["pending", "approved", "declined"]

2) Create deterministic tools for enrichment and decision support

AutoGen works best when the LLM reasons over facts pulled by tools. Keep these tools small and auditable.

import os
import json
from autogen import AssistantAgent, UserProxyAgent

def get_customer_risk_profile(user_id: str):
    # Replace with database lookup or internal risk service call
    return {
        "user_id": user_id,
        "account_age_days": 420,
        "kyc_level": "enhanced",
        "historical_fraud_score": 0.12,
        "chargeback_rate": 0.03,
        "country_residency": "KE",
    }

def create_fraud_case(transaction_id: str, reason: str):
    # Replace with case management API call
    return {"case_id": f"CASE-{transaction_id}", "status": "open", "reason": reason}

3) Build the AutoGen agents and register tools

This pattern uses AssistantAgent for analysis and UserProxyAgent to execute Python tool calls locally. In production you would usually route tool execution through your own service boundary instead of letting the agent run arbitrary code.

llm_config = {
    "model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
    "api_key": os.getenv("OPENAI_API_KEY"),
}

fraud_analyst = AssistantAgent(
    name="fraud_analyst",
    llm_config=llm_config,
    system_message=(
        "You are a fraud analyst for a fintech platform. "
        "Use only provided evidence. "
        "Return a concise risk assessment with one of: APPROVE, REVIEW, DECLINE. "
        "Explain which signals drove the decision."
    ),
)

executor = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",
)

executor.register_function(
    function_map={
        "get_customer_risk_profile": get_customer_risk_profile,
        "create_fraud_case": create_fraud_case,
    }
)

4) Run a controlled analysis flow

The key pattern is to enrich first, then ask for a decision in a constrained format. That keeps the model from inventing missing facts and gives you something you can log.

event = TransactionEvent(
    transaction_id="tx_10091",
    user_id="user_44",
    amount=980.50,
    currency="USD",
    country="NG",
    device_id="dev_7781",
    ip_risk_score=87,
    velocity_5m=6,
    prior_chargebacks_90d=2,
    merchant_category="digital_goods",
    status="pending",
)

profile = get_customer_risk_profile(event.user_id)

prompt = f"""
Transaction:
{json.dumps(event.__dict__, indent=2)}

Customer profile:
{json.dumps(profile, indent=2)}

Decision rules:
- Decline if ip_risk_score > 80 AND velocity_5m >= 5 AND prior_chargebacks_90d >= 2
- Review if any two of those signals are elevated
- Approve otherwise

Return JSON with keys:
decision, confidence, rationale, next_action
"""

result = executor.initiate_chat(
    fraud_analyst,
    message=prompt,
)

print(result.summary)

If you want escalation on REVIEW, chain another deterministic action:

analysis = result.summary.lower()
if '"decision": "review"' in analysis:
    case = create_fraud_case(event.transaction_id, "High-risk transaction pattern")
    print(case)

Production Considerations

  • Keep PII out of prompts unless necessary

    • Mask card numbers, national IDs, full addresses, and free-text notes.
    • Use tokenized identifiers so audit logs stay useful without exposing regulated data.
  • Enforce data residency at the tool layer

    • If your customer data must stay in-region, run enrichment services inside the required jurisdiction.
    • The LLM should receive only minimal derived features where possible.
  • Log every decision path

    • Store input features, tool outputs, prompt version, model version, decision text, and final disposition.
    • This matters for chargeback disputes and regulator reviews.
  • Add hard guardrails before model output

    • Decline obviously sanctioned or blocked cases with rules.
    • Use the agent for borderline cases where reasoning over multiple weak signals helps.

Common Pitfalls

  • Letting the LLM make the first pass

    • Bad pattern: send raw transactions directly to AssistantAgent and trust its judgment.
    • Fix: apply deterministic rules and feature extraction before any model call.
  • Using free-form outputs in production

    • Bad pattern: parse vague prose like “this looks suspicious.”
    • Fix: force structured output with explicit labels like APPROVE, REVIEW, DECLINE.
  • Ignoring auditability

    • Bad pattern: only store the final decision.
    • Fix: persist prompts, tool results, model version, timestamps, and case outcomes so compliance can reconstruct the decision later.

If you build this way, AutoGen becomes a controlled reasoning layer on top of your fraud stack instead of a black box sitting on top of customer money. That is the right shape for fintech.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides