How to Build a fraud detection Agent Using CrewAI in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
fraud-detectioncrewaipythonretail-banking

A fraud detection agent in retail banking watches transaction streams, enriches them with customer context, and decides whether a case should be flagged, escalated, or cleared. The point is not just catching suspicious activity; it is reducing false positives, preserving auditability, and making sure every decision can survive compliance review.

Architecture

  • Transaction intake layer

    • Pulls card swipes, ACH transfers, wire activity, login events, and device signals.
    • Normalizes the payload into a single fraud case schema.
  • Risk enrichment layer

    • Adds customer profile data, account tenure, historical chargebacks, geo velocity, merchant risk, and device reputation.
    • This is where the agent gets enough context to avoid noisy alerts.
  • Fraud analysis agent

    • Uses CrewAI’s Agent to reason over the transaction plus enrichment data.
    • Produces a structured decision: approve, review, or escalate.
  • Investigation workflow

    • Uses Task objects to split work into scoring, explanation, and escalation summary.
    • Keeps the output readable for fraud ops and auditors.
  • Orchestration layer

    • Uses Crew to run the agent workflow deterministically.
    • Can be wired into an event-driven pipeline from Kafka, SQS, or a bank’s internal rules engine.
  • Audit and governance store

    • Persists inputs, outputs, model version, prompt version, timestamps, and reviewer actions.
    • Required for compliance evidence and post-incident review.

Implementation

  1. Install CrewAI and define your fraud case schema

You want a strict input contract before any agent sees live banking data. In production I usually keep this in Pydantic so the pipeline rejects malformed events early.

pip install crewai pydantic
from pydantic import BaseModel
from typing import Optional

class FraudCase(BaseModel):
    transaction_id: str
    customer_id: str
    amount: float
    currency: str
    merchant_name: str
    merchant_country: str
    customer_country: str
    channel: str
    device_id: Optional[str] = None
    ip_country: Optional[str] = None
    account_age_days: int
    prior_chargebacks_90d: int
  1. Create the fraud analyst agent with explicit instructions

Keep the agent narrow. Do not ask it to “detect fraud” in vague terms; tell it what signals matter in retail banking and what output format you expect.

from crewai import Agent

fraud_analyst = Agent(
    role="Retail Banking Fraud Analyst",
    goal=(
        "Assess transaction risk using customer behavior, geography, "
        "merchant patterns, and historical fraud indicators."
    ),
    backstory=(
        "You are a senior fraud analyst at a retail bank. "
        "You produce concise decisions that can be reviewed by operations "
        "and compliance teams."
    ),
    verbose=True,
)
  1. Define tasks for scoring and explanation

A practical pattern is to split detection from explanation. The first task produces the decision; the second turns that decision into an audit-friendly summary.

from crewai import Task

score_task = Task(
    description=(
        "Review the transaction case and decide whether it should be "
        "'approve', 'review', or 'escalate'. Consider amount spikes, "
        "cross-border activity, device mismatch, prior chargebacks, and "
        "account age. Return JSON with fields: decision, risk_score (0-100), "
        "reason_codes."
    ),
    expected_output="JSON object with decision, risk_score, reason_codes",
    agent=fraud_analyst,
)

explain_task = Task(
    description=(
        "Write a short investigator note explaining why the decision was made. "
        "Mention only factual signals from the case data."
    ),
    expected_output="Plain-English investigator note",
    agent=fraud_analyst,
)
  1. Assemble the crew and run it against a real case

This is the actual CrewAI execution pattern. In a bank pipeline you would call this from an API worker or stream processor after enrichment completes.

from crewai import Crew, Process

case = FraudCase(
    transaction_id="txn_983421",
    customer_id="cust_11209",
    amount=4850.00,
    currency="USD",
    merchant_name="ElectroHub Online",
    merchant_country="NG",
    customer_country="US",
    channel="card_not_present",
    device_id="device_77a1",
    ip_country="GB",
    account_age_days=14,
    prior_chargebacks_90d=2,
)

crew = Crew(
    agents=[fraud_analyst],
    tasks=[score_task, explain_task],
    process=Process.sequential,
)

result = crew.kickoff(inputs={"case": case.model_dump()})
print(result)

That pattern works because Crew, Agent, Task, Process.sequential, and kickoff() are all real CrewAI primitives. The important part is that your inputs stay structured so downstream systems can log them cleanly.

Production Considerations

  • Compliance logging

    • Store the full request/response payloads with timestamps, model name, prompt version, and decision outcome.
    • For retail banking this matters for AML reviews, disputes handling, internal audit, and regulator requests.
  • Data residency

    • Keep PII inside approved regions and approved vendors only.
    • If your bank has country-specific residency rules, do not send raw customer data to external APIs without legal approval and encryption controls.
  • Guardrails on actioning

    • Let the agent recommend actions; do not let it auto-close accounts or block cards without policy checks.
    • Route high-risk decisions through deterministic thresholds or human review queues.
  • Monitoring

    • Track false positives by segment: card-not-present transactions are not the same as branch-originated transfers.
    • Watch drift in merchant categories, geography mismatches, average transaction size, and escalation rates over time.

Common Pitfalls

  • Using free-form prompts instead of structured outputs

    • If your agent returns paragraphs only, your downstream systems will break.
    • Force JSON-like output for decisions and keep explanation separate from scoring.
  • Feeding raw sensitive data without minimization

    • Don’t pass full PANs, unnecessary identity fields, or unmasked account numbers into the agent.
    • Use tokenized identifiers and only include features needed for risk assessment.
  • Treating LLM output as final truth

    • An LLM can help triage cases; it should not replace deterministic rules for sanctions hits, velocity limits, or known-fraud fingerprints.
    • Combine CrewAI reasoning with policy engines and human review for edge cases.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides