How to Build a transaction monitoring Agent Using CrewAI in Python for payments

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringcrewaipythonpayments

A transaction monitoring agent for payments watches payment events, scores them against risk rules and model outputs, and escalates suspicious activity for review. It matters because false negatives become fraud losses and false positives become operational drag, compliance noise, and bad customer experience.

Architecture

  • Payment event ingest

    • Pulls card, ACH, wallet, or RTP transactions from a queue, stream, or webhook.
    • Normalizes fields like amount, merchant_category, country, customer_id, and device_id.
  • Risk feature builder

    • Enriches each transaction with velocity counts, customer history, merchant risk tier, geo mismatch, and device reputation.
    • Keeps enrichment deterministic so auditors can reproduce decisions.
  • Monitoring crew

    • Uses CrewAI agents to inspect the transaction from different angles:
      • fraud analyst
      • compliance reviewer
      • escalation coordinator
    • Produces a structured decision: approve, hold, escalate, or block.
  • Policy and guardrail layer

    • Enforces hard rules before any LLM output is trusted.
    • Handles sanctions hits, jurisdiction restrictions, PII redaction, and thresholds that require mandatory escalation.
  • Case management sink

    • Writes alerts to your case system with reason codes, evidence, and model inputs.
    • Stores immutable audit logs for review and regulatory requests.

Implementation

1) Define the transaction schema and tools

Keep the agent’s inputs narrow. In payments, the agent should not “invent” missing facts; it should work from a validated payload plus explicit tools for enrichment and alerting.

from pydantic import BaseModel, Field
from crewai.tools import BaseTool

class Transaction(BaseModel):
    transaction_id: str
    customer_id: str
    amount: float
    currency: str
    merchant_id: str
    merchant_category: str
    country: str
    device_id: str | None = None
    timestamp: str

class RiskLookupTool(BaseTool):
    name: str = "risk_lookup"
    description: str = "Fetch deterministic risk signals for a payment transaction."

    def _run(self, transaction_id: str) -> dict:
        # Replace with DB/feature store lookup
        return {
            "velocity_1h": 7,
            "chargeback_rate_30d": 0.03,
            "merchant_risk_score": 82,
            "geo_mismatch": True,
        }

class CreateAlertTool(BaseTool):
    name: str = "create_alert"
    description: str = "Create a case management alert for suspicious transactions."

    def _run(self, transaction_id: str, reason: str) -> dict:
        # Replace with your case management API call
        return {"status": "created", "alert_id": f"alert_{transaction_id}", "reason": reason}

2) Create specialized agents

Use separate agents for analysis and escalation. That keeps responsibilities clear and makes audit trails easier to explain.

from crewai import Agent

fraud_analyst = Agent(
    role="Fraud Analyst",
    goal="Assess whether the payment shows fraud patterns using deterministic signals.",
    backstory="You review payment transactions for fraud indicators and explain your reasoning clearly.",
    tools=[RiskLookupTool()],
    verbose=True,
)

compliance_reviewer = Agent(
    role="Compliance Reviewer",
    goal="Check whether the transaction requires regulatory or policy escalation.",
    backstory="You focus on AML/KYC policy triggers, sanctions risk, jurisdiction constraints, and auditability.",
    tools=[],
    verbose=True,
)

escalation_agent = Agent(
    role="Escalation Coordinator",
    goal="Decide whether to approve, hold, or create an alert based on prior findings.",
    backstory="You convert analysis into an operational action with concise evidence.",
    tools=[CreateAlertTool()],
    verbose=True,
)

3) Wire tasks into a sequential Crew

For payments monitoring, start simple. A sequential process is easier to test than a complex autonomous loop.

from crewai import Task, Crew, Process

tx = Transaction(
    transaction_id="tx_1001",
    customer_id="cust_42",
    amount=1299.00,
    currency="USD",
    merchant_id="m_900",
    merchant_category="electronics",
    country="NG",
    device_id="dev_123",
    timestamp="2026-04-21T10:15:00Z",
)

fraud_task = Task(
    description=(
        f"Analyze this payment transaction for fraud risk:\n{tx.model_dump_json(indent=2)}\n"
        "Use the risk_lookup tool if needed. Return JSON with fields "
        "`risk_level`, `reasons`, `recommended_action`."
    ),
    expected_output="Structured JSON with risk assessment.",
    agent=fraud_analyst,
)

compliance_task = Task(
    description=(
        f"Review this payment for compliance concerns:\n{tx.model_dump_json(indent=2)}\n"
        "Focus on AML/KYC triggers, sanctions exposure, data residency concerns, "
        "and whether human review is required."
    ),
    expected_output="Structured JSON with compliance assessment.",
     agent=compliance_reviewer,
)

escalation_task = Task(
   description=(
       "Combine prior outputs and decide the operational action. "
       "If escalation is needed, use create_alert."
   ),
   expected_output="Final disposition with evidence.",
   agent=escalation_agent,
)

crew = Crew(
   agents=[fraud_analyst, compliance_reviewer, escalation_agent],
   tasks=[fraud_task, compliance_task, escalation_task],
   process=Process.sequential,
   verbose=True,
)

result = crew.kickoff()
print(result)

4) Add hard controls outside the LLM path

Do not let the agent be the only control point. Payments systems need deterministic gates before any model-driven decision is acted on.

  • Block sanctioned countries or entities before calling the crew.
  • Redact PANs, account numbers, and other sensitive fields.
  • Require human review above a threshold amount or when confidence is low.
  • Persist every input/output pair with timestamps for audit.

Production Considerations

  • Deployment

    • Run the agent as an async worker behind your payment event bus.
    • Keep feature lookup services in-region to satisfy data residency requirements.
    • Separate model runtime from PCI-scoped systems; never pass raw card data into prompts.
  • Monitoring

    • Track alert rate, precision proxy metrics, queue lag, tool failure rate, and average time-to-decision.
    • Log every crew run with transaction ID so investigators can replay decisions.
    • Sample outputs daily to catch drift in reasoning style or policy misses.
  • Guardrails

    • Enforce JSON-only outputs for downstream parsing.
    • Add rule-based overrides for sanctions hits, duplicate auth attempts, rapid velocity spikes, and high-risk MCCs.
    • Keep a denylist of fields the agent must never echo back in plain text.
  • Auditability

    • Store prompt versions alongside decision records.
    • Attach reason codes that map to internal policy language.
    • Make sure retention aligns with AML recordkeeping rules in your operating jurisdictions.

Common Pitfalls

  • Letting the agent make final block/approve decisions without deterministic checks

    • Fix it by placing rule engines ahead of CrewAI and using the agent only as an investigation layer.
  • Sending raw sensitive payment data into prompts

    • Fix it by tokenizing PANs/account numbers and stripping unnecessary PII before task creation.
  • Using one generic agent for fraud and compliance

    • Fix it by splitting responsibilities. Fraud scoring and regulatory review have different evidence requirements and different audit expectations.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides