How to Build a transaction monitoring Agent Using CrewAI in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringcrewaipythonretail-banking

A transaction monitoring agent reviews customer transactions, flags suspicious patterns, and routes cases for investigation before they become compliance or fraud problems. In retail banking, that matters because you need to detect money laundering, structuring, account takeover, and unusual behavior fast, while keeping a clean audit trail for regulators and internal risk teams.

Architecture

  • Transaction ingestion layer

    • Pulls transactions from a core banking feed, event bus, or batch file.
    • Normalizes fields like amount, merchant category, channel, country, and customer segment.
  • Risk scoring tool layer

    • Implements deterministic checks such as velocity rules, threshold breaches, geo anomalies, and sanctions screening hooks.
    • Exposes these checks to the agent as callable tools.
  • CrewAI agent layer

    • Uses a Crew with one or more Agent objects to reason over transactions and decide whether to escalate.
    • Produces structured outputs: low risk, review required, or SAR/AML escalation candidate.
  • Case management output layer

    • Writes alerts into your case management system or queue.
    • Stores the rationale used by the agent for auditability.
  • Compliance and audit store

    • Persists prompts, tool outputs, model responses, timestamps, and versioned rules.
    • Supports regulator review and internal model governance.

Implementation

  1. Install CrewAI and define your transaction schema

Keep the transaction object explicit. Retail banking monitoring fails when teams pass around untyped JSON with missing fields and inconsistent currency handling.

from pydantic import BaseModel
from typing import List, Optional

class Transaction(BaseModel):
    transaction_id: str
    customer_id: str
    amount: float
    currency: str
    merchant_category: str
    channel: str
    country: str
    timestamp: str
    prior_transactions_24h: int
    prior_amount_24h: float
    is_high_risk_country: bool = False
  1. Create deterministic tools for the agent

CrewAI works best when the LLM does reasoning and your tools do factual checks. For banking use cases, keep the high-signal controls deterministic so they are explainable.

from crewai.tools import BaseTool

class RiskRulesTool(BaseTool):
    name: str = "risk_rules"
    description: str = "Apply retail banking transaction monitoring rules."

    def _run(self, transaction: dict) -> str:
        reasons = []

        if transaction["amount"] >= 10000:
            reasons.append("cash-like high-value transaction")

        if transaction["prior_transactions_24h"] >= 5 and transaction["prior_amount_24h"] > 15000:
            reasons.append("possible structuring / velocity pattern")

        if transaction["is_high_risk_country"]:
            reasons.append("high-risk jurisdiction")

        if transaction["channel"] in ["card-not-present", "mobile"] and transaction["merchant_category"] == "crypto":
            reasons.append("higher-risk digital asset exposure")

        if not reasons:
            return "No major rule breaches detected."
        return "; ".join(reasons)
  1. Build the CrewAI agent and crew

Use one agent for triage if your first version is focused on alert generation. Add a second reviewer agent later if you want separation between detection and disposition.

from crewai import Agent, Task, Crew, Process

monitoring_agent = Agent(
    role="Transaction Monitoring Analyst",
    goal="Assess retail banking transactions for AML/fraud risk and produce an auditable disposition.",
    backstory=(
        "You work in a retail bank's financial crime team. "
        "You must be conservative, explain every decision clearly, "
        "and prefer escalation when evidence is ambiguous."
    ),
    tools=[RiskRulesTool()],
    verbose=True,
)

task = Task(
    description=(
        "Review the provided transaction using the risk_rules tool. "
        "Return a JSON-like summary with fields: disposition, reasons, severity."
    ),
    expected_output="A concise monitoring decision with explanation suitable for case notes.",
    agent=monitoring_agent,
)

crew = Crew(
    agents=[monitoring_agent],
    tasks=[task],
    process=Process.sequential,
    verbose=True,
)
  1. Run the workflow and persist the result

In production you would call this from a service worker after ingestion. The key is to store both input data and output rationale for audit purposes.

sample_txn = Transaction(
    transaction_id="tx_10001",
    customer_id="cust_8821",
    amount=12500.0,
    currency="USD",
    merchant_category="wire_transfer",
    channel="online_banking",
    country="US",
    timestamp="2026-04-21T10:12:00Z",
    prior_transactions_24h=6,
    prior_amount_24h=18450.0,
    is_high_risk_country=False,
)

result = crew.kickoff(inputs={"transaction": sample_txn.model_dump()})
print(result)

A practical output pattern is:

  • LOW_RISK when no rule breaches exist.
  • REVIEW when there are moderate signals but not enough evidence for escalation.
  • ESCALATE when multiple controls fire or the pattern matches known typologies.

That gives investigators something usable instead of a vague paragraph from an LLM.

Production Considerations

  • Data residency

    • Keep customer data in-region if your bank operates under local residency requirements.
    • If you use hosted models or external APIs, verify where prompts and logs are stored.
  • Auditability

    • Persist every input transaction snapshot, tool output, final disposition, model version, prompt version, and timestamp.
    • Regulators will ask why a case was flagged or missed; “the model said so” is not acceptable.
  • Guardrails

    • Force structured output with a strict schema before anything reaches case management.
    • Block free-form recommendations that can bypass policy thresholds or human review requirements.
  • Monitoring

    • Track alert rate by segment, false positive rate, investigator override rate, and time-to-disposition.
    • Watch for drift in customer behavior patterns after product changes or seasonality shifts.

Common Pitfalls

  1. Letting the LLM replace hard rules

    • Don’t ask the model to infer AML thresholds from scratch.
    • Use deterministic tools for policy checks and let CrewAI handle reasoning over those results.
  2. Ignoring explainability

    • If your output is just “suspicious,” investigators will reject it.
    • Return specific reasons tied to observable facts like amount spikes, velocity bursts, or risky geographies.
  3. Skipping schema enforcement

    • Free-form text breaks downstream case systems.
    • Use Pydantic models for inputs and require structured outputs from the agent so alerts stay machine-readable.
  4. Deploying without governance controls

    • Retail banking needs versioning for prompts, rules, models, and thresholds.
    • Treat every change like a policy change because that is how compliance will review it later.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides