How to Build a fraud detection Agent Using CrewAI in Python for payments

By Cyprian AaronsUpdated 2026-04-21
fraud-detectioncrewaipythonpayments

A fraud detection agent for payments watches transaction context, scores suspicious behavior, and routes high-risk cases for review or blocking. It matters because payment fraud is a latency-sensitive problem: you need fast decisions, consistent audit trails, and a clear reason for every action taken on a card, wallet, or bank transfer.

Architecture

  • Transaction intake service

    • Receives payment events from your gateway, PSP, or internal ledger.
    • Normalizes fields like amount, currency, merchant category, device fingerprint, IP, and customer history.
  • Risk analysis agent

    • Uses CrewAI Agent to inspect the transaction and call tools for enrichment.
    • Produces a structured fraud assessment with risk score and rationale.
  • Enrichment tools

    • Pulls velocity checks, geo/IP reputation, chargeback history, account age, and prior disputes.
    • Keeps the agent grounded in actual payment signals instead of free-form reasoning.
  • Decision policy layer

    • Converts the agent output into actions: approve, step-up auth, hold for review, or decline.
    • Enforces hard rules that should never be overridden by an LLM.
  • Audit and case store

    • Persists inputs, outputs, tool calls, timestamps, and final decisions.
    • Required for PCI-adjacent controls, internal investigations, and regulator reviews.
  • Monitoring and feedback loop

    • Tracks false positives, false negatives, manual review outcomes, and drift in transaction patterns.
    • Feeds labeled outcomes back into your rules and prompts.

Implementation

1) Install CrewAI and define the risk tools

Use tools for deterministic checks. The agent should reason over evidence; it should not invent evidence.

from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Dict


class TransactionInput(BaseModel):
    transaction_id: str = Field(..., description="Payment transaction identifier")
    amount: float = Field(..., description="Transaction amount")
    currency: str = Field(..., description="ISO currency code")
    customer_id: str = Field(..., description="Customer identifier")
    merchant_id: str = Field(..., description="Merchant identifier")
    ip_address: str = Field(..., description="Customer IP address")


class VelocityCheckTool(BaseTool):
    name: str = "velocity_check"
    description: str = "Checks recent transaction velocity for a customer"

    def _run(self, customer_id: str) -> Dict:
        # Replace with Redis/Postgres/feature store lookup
        recent_txn_count = 7
        return {"customer_id": customer_id, "last_10m_count": recent_txn_count}


class GeoRiskTool(BaseTool):
    name: str = "geo_risk_lookup"
    description: str = "Returns IP-based geo risk signal"

    def _run(self, ip_address: str) -> Dict:
        # Replace with MaxMind / internal risk service
        return {"ip_address": ip_address, "country": "NG", "risk_flag": True}

2) Create the fraud analyst agent

Keep the prompt narrow. In payments systems you want consistent outputs that downstream code can parse.

from crewai import Agent

fraud_agent = Agent(
    role="Payments Fraud Analyst",
    goal="Assess payment transactions for fraud risk using provided evidence only",
    backstory=(
        "You review payment events for fraud indicators such as velocity spikes,"
        " geo mismatch, unusual amount patterns, and merchant abuse."
        " You must return a concise risk assessment."
    ),
    tools=[VelocityCheckTool(), GeoRiskTool()],
    verbose=True,
)

3) Define a task that forces structured output

CrewAI tasks can carry explicit instructions. For production use JSON-like output so your policy engine can consume it safely.

from crewai import Task

fraud_task = Task(
    description=(
        "Analyze this payment transaction for fraud risk.\n"
        "Return:\n"
        "- risk_level: low|medium|high\n"
        "- score: integer 0-100\n"
        "- reasons: list of short strings\n"
        "- action: approve|step_up|review|decline\n\n"
        "Transaction:\n{transaction}"
    ),
    expected_output="A structured fraud decision with reasons and action.",
    agent=fraud_agent,
)

4) Run the crew and apply a hard policy gate

The LLM proposes; your policy decides. That separation matters when you need deterministic behavior under compliance review.

from crewai import Crew
import json


def decide_action(result_text: str) -> dict:
    # In production parse strict JSON from the model output.
    # Keep this example simple but explicit.
    if "decline" in result_text.lower():
        return {"final_action": "decline"}
    if "review" in result_text.lower():
        return {"final_action": "review"}
    if "step_up" in result_text.lower():
        return {"final_action": "step_up"}
    return {"final_action": "approve"}


transaction = TransactionInput(
    transaction_id="txn_123",
    amount=499.99,
    currency="USD",
    customer_id="cus_456",
    merchant_id="m_789",
    ip_address="102.88.12.44",
)

crew = Crew(
    agents=[fraud_agent],
    tasks=[fraud_task],
)

result = crew.kickoff(inputs={"transaction": transaction.model_dump_json()})
decision = decide_action(str(result))

print({"agent_result": str(result), **decision})

Production Considerations

  • Put hard limits outside the model

    • Block sanctioned geographies, impossible amounts, BIN-country mismatches, or known stolen cards before the agent runs.
    • The model should not be your first line of defense.
  • Log every decision path

    • Store raw inputs, tool outputs, prompt version, model version, final action.
    • For payments audits you need traceability from alert to outcome.
  • Respect data residency and PCI boundaries

    • Do not send PANs or sensitive authentication data to the model.
    • Tokenize card data and keep regional processing aligned with residency requirements.
  • Monitor precision by payment segment

    • Separate metrics by card-not-present vs card-present, geography, merchant category code (MCC), and channel.
    • A single global false-positive rate hides real losses in specific segments.

Common Pitfalls

  • Letting the agent make final authorization decisions

    • Avoid this by keeping approval/decline logic in a deterministic policy layer.
    • Use the agent for assessment; use code for enforcement.
  • Passing raw sensitive payment data into prompts

    • Never include full PANs, CVVs, or secrets.
    • Tokenize identifiers and pass only the minimum fields needed for analysis.
  • No feedback loop from chargebacks and manual reviews

    • Without labels you will drift fast.
    • Feed confirmed fraud and false positives back into your ruleset and tool signals so thresholds stay calibrated.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides