How to Build a fraud detection Agent Using CrewAI in Python for banking

By Cyprian AaronsUpdated 2026-04-21
fraud-detectioncrewaipythonbanking

A fraud detection agent in banking ingests transaction data, enriches it with customer and merchant context, scores risk, and escalates suspicious cases for human review. It matters because fraud losses are expensive, but so are false positives: every bad block creates customer friction, support load, and regulatory noise.

Architecture

A production-grade CrewAI fraud agent needs these components:

  • Transaction intake

    • Pulls events from Kafka, SQS, or a database queue
    • Normalizes fields like amount, currency, merchant category, geo, device ID
  • Risk enrichment

    • Looks up customer profile, account age, velocity metrics, historical chargebacks
    • Pulls merchant reputation and device fingerprint signals
  • Fraud analysis agent

    • Applies banking rules plus LLM-assisted reasoning on structured evidence
    • Produces a risk score and explanation tied to observed signals
  • Compliance and audit layer

    • Stores every decision input/output
    • Keeps immutable logs for model governance and regulator review
  • Escalation workflow

    • Opens a case in the fraud ops queue when risk exceeds threshold
    • Routes low-confidence or high-impact cases to human analysts
  • Policy guardrails

    • Prevents the agent from making unsupported claims
    • Blocks actions that violate data residency or PII handling rules

Implementation

1) Define the tasks and agents

Use one agent for analysis and one for compliance review. Keep the LLM role narrow; the actual fraud logic should come from structured inputs, not free-form guessing.

from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM

fraud_llm = LLM(
    model="gpt-4o-mini",
    temperature=0.0
)

fraud_analyst = Agent(
    role="Fraud Analyst",
    goal="Assess whether a banking transaction is likely fraudulent using provided evidence.",
    backstory=(
        "You analyze banking transactions with strict attention to "
        "velocity patterns, geo mismatch, merchant risk, and customer history."
    ),
    llm=fraud_llm,
    verbose=True,
)

compliance_reviewer = Agent(
    role="Compliance Reviewer",
    goal="Check that the fraud decision is explainable, auditable, and compliant.",
    backstory=(
        "You ensure decisions contain evidence references, avoid unsupported claims, "
        "and respect banking compliance requirements."
    ),
    llm=fraud_llm,
    verbose=True,
)

2) Build the task chain

The first task produces a structured assessment. The second task checks auditability and compliance language before anything is sent downstream.

transaction_payload = {
    "transaction_id": "tx_98341",
    "account_id": "acct_4421",
    "amount": 4200.75,
    "currency": "USD",
    "merchant": "electronics-store-17",
    "merchant_category": "Electronics",
    "country": "US",
    "customer_home_country": "GB",
    "device_id": "dev_8821",
    "ip_country": "RO",
    "velocity_24h_tx_count": 9,
    "chargeback_rate_90d": 0.08,
}

fraud_task = Task(
    description=(
        f"Analyze this transaction for fraud risk:\n{transaction_payload}\n\n"
        "Return:\n"
        "- risk_score from 0 to 100\n"
        "- decision: approve | review | decline\n"
        "- top_signals: list of evidence-based reasons\n"
        "- recommended_action\n"
        "- confidence from 0 to 1"
    ),
    expected_output="A concise fraud assessment with structured fields.",
    agent=fraud_analyst,
)

compliance_task = Task(
    description=(
        "Review the fraud assessment for auditability and compliance. "
        "Confirm that the result is based on provided evidence only, includes "
        "clear rationale, and avoids PII leakage."
    ),
    expected_output="A compliance-safe validation note.",
    agent=compliance_reviewer,
)

3) Run the crew and consume structured output

For banking workflows, keep execution deterministic where possible. Use Process.sequential so compliance review always happens after analysis.

crew = Crew(
    agents=[fraud_analyst, compliance_reviewer],
    tasks=[fraud_task, compliance_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()

print(result)

In production, wrap this in an API service that does three things:

  • validates incoming transaction schemas
  • redacts PII before sending context to the model
  • writes the full request/response trail to an immutable audit store

4) Add a real scoring gate before case creation

Do not let the LLM directly decide money movement. Use its output as an advisory signal inside a policy engine.

def route_transaction(risk_score: int):
    if risk_score >= 85:
        return "decline_and_open_case"
    if risk_score >= 60:
        return "manual_review"
    return "approve"

# Example parsing logic depends on your output format.
risk_score = 87
action = route_transaction(risk_score)
print(action)

That pattern keeps control in your application layer. In banking, the model recommends; policy decides.

Production Considerations

  • Deployment

    • Run the agent behind an internal service boundary with mTLS.
    • Keep model access inside approved regions if your bank has data residency constraints.
    • Separate inference traffic from analyst tooling so incident response stays clean.
  • Monitoring

    • Track false positives, false negatives, analyst override rate, and time-to-decision.
    • Log prompt version, model version, input feature set, and final action for audit.
    • Alert on drift in merchant categories, geographies, or device patterns.
  • Guardrails

    • Redact PANs, CVVs, full account numbers, and unnecessary PII before prompting.
    • Enforce allowlisted tools only; no free-form web access or external calls.
    • Require human approval for high-value transactions or sanctions-adjacent cases.
  • Governance

    • Store decision traces for retention periods required by your jurisdiction.
    • Document why each signal was used so model risk teams can review it later.
    • Make sure legal/compliance signs off on any automated decline policy.

Common Pitfalls

  1. Letting the LLM make final decisions

    • Bad pattern: “the model said decline.”
    • Fix: use CrewAI for analysis and explanation; use deterministic rules or a policy engine for final action.
  2. Sending raw sensitive data into prompts

    • Bad pattern: full card numbers, names, addresses in every task.
    • Fix: tokenize or redact sensitive fields first. Only pass what is needed for fraud reasoning.
  3. No audit trail

    • Bad pattern: storing only the final score.
    • Fix: persist transaction snapshot hash, prompt versioning, task outputs, reviewer notes, and downstream action.

If you build this right, CrewAI becomes a coordination layer for fraud operations rather than a black box. That is what banks need: explainable triage at speed without giving up control.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides