How to Build a transaction monitoring Agent Using CrewAI in Python for banking
A transaction monitoring agent watches payment activity, scores suspicious patterns, and routes cases for human review. In banking, that matters because you need fast detection of fraud, AML typologies, and policy breaches without turning every alert into an analyst bottleneck.
Architecture
A production-grade transaction monitoring agent needs these components:
- •
Ingestion layer
- •Pulls transactions from a queue, database, or event stream.
- •Normalizes fields like
amount,currency,counterparty,timestamp, andchannel.
- •
Rules and enrichment tools
- •Checks hard controls such as velocity limits, sanctioned geographies, or blacklisted counterparties.
- •Enriches transactions with customer risk tier, account age, historical behavior, and KYC status.
- •
LLM reasoning layer
- •Uses CrewAI agents to interpret alerts, explain patterns, and produce a case summary.
- •Should not make final compliance decisions without deterministic checks.
- •
Case management output
- •Writes structured findings to a case system or investigation table.
- •Includes evidence, rule hits, and a clear recommendation for analyst review.
- •
Audit and observability
- •Logs every input, tool call, model output, and decision path.
- •Required for regulator review, internal audit, and model governance.
Implementation
1) Define the transaction schema and deterministic tools
Keep the LLM away from raw policy logic. Use tools for scoring and enrichment so your controls stay auditable.
from typing import Dict, Any
from pydantic import BaseModel
from crewai.tools import BaseTool
class Transaction(BaseModel):
transaction_id: str
customer_id: str
amount: float
currency: str
country: str
channel: str
timestamp: str
counterparty: str
class VelocityCheckTool(BaseTool):
name: str = "velocity_check"
description: str = "Checks whether a transaction exceeds simple velocity thresholds."
def _run(self, customer_id: str, amount: float) -> Dict[str, Any]:
threshold = 10000.0
return {
"customer_id": customer_id,
"threshold": threshold,
"breach": amount > threshold,
"reason": "amount_above_threshold" if amount > threshold else "ok"
}
class SanctionsCheckTool(BaseTool):
name: str = "sanctions_check"
description: str = "Checks whether the counterparty or country is high risk."
def _run(self, country: str, counterparty: str) -> Dict[str, Any]:
high_risk_countries = {"IR", "KP", "SY"}
blocked_names = {"ACME FRONT LTD", "SHELL TRADING CO"}
breach = country in high_risk_countries or counterparty.upper() in blocked_names
return {
"country": country,
"counterparty": counterparty,
"breach": breach,
"reason": "high_risk_entity_or_country" if breach else "ok"
}
2) Create the monitoring agent with CrewAI
Use one agent for investigation and one task for producing a case summary. The agent should explain findings, not invent facts.
from crewai import Agent
monitoring_agent = Agent(
role="Transaction Monitoring Analyst",
goal="Review transaction alerts and produce an auditable suspicious activity assessment.",
backstory=(
"You are a banking compliance analyst focused on AML and fraud triage. "
"You only use provided evidence and tool outputs."
),
verbose=True,
allow_delegation=False,
)
3) Wire the task and crew together
The task should ask for a structured output that an investigator can consume. Keep the prompt explicit about evidence use and escalation language.
from crewai import Task, Crew
transaction_payload = Transaction(
transaction_id="TXN-10001",
customer_id="CUST-42",
amount=25000.0,
currency="USD",
country="GB",
channel="wire",
timestamp="2026-04-21T10:15:00Z",
counterparty="Acme Front Ltd"
)
task = Task(
description=(
f"Review this transaction for suspicious activity:\n"
f"{transaction_payload.model_dump()}\n\n"
f"Use the attached tool outputs to decide whether this requires analyst escalation. "
f"Return a concise summary with risk level, reasons, and next action."
),
expected_output=(
"A short case note with fields: risk_level, reasons, evidence_used, "
"recommended_action."
),
agent=monitoring_agent,
)
crew = Crew(
agents=[monitoring_agent],
tasks=[task],
)
4) Execute with tool results included in the prompt
CrewAI will run the task through the agent. In banking systems I usually inject deterministic tool results into the task context before execution so the LLM sees verified evidence.
velocity_tool = VelocityCheckTool()
sanctions_tool = SanctionsCheckTool()
velocity_result = velocity_tool._run(
customer_id=transaction_payload.customer_id,
amount=transaction_payload.amount,
)
sanctions_result = sanctions_tool._run(
country=transaction_payload.country,
counterparty=transaction_payload.counterparty,
)
task.description += (
"\n\nDeterministic checks:\n"
f"- Velocity check: {velocity_result}\n"
f"- Sanctions check: {sanctions_result}\n"
)
result = crew.kickoff()
print(result)
A better production pattern is to persist velocity_result, sanctions_result, prompt text, model response, and timestamps into an audit store before you hand off to case management.
Production Considerations
- •
Deployment
- •Run the agent as an async worker behind a queue like Kafka or SQS.
- •Keep it stateless; store case state in Postgres or your GRC platform.
- •Pin model versions and CrewAI versions so behavior does not drift between releases.
- •
Monitoring
- •Track alert volume, false positive rate, escalation rate, and average handling time.
- •Log every tool call separately from LLM output.
- •Add tracing so auditors can reconstruct why a case was escalated.
- •
Guardrails
- •Never let the model override sanctions hits or hard policy rules.
- •Force structured outputs with schema validation before writing to downstream systems.
- •Block prompts from containing raw PII unless your data handling policy allows it.
- •
Banking controls
- •Respect data residency by keeping processing inside approved regions.
- •Redact sensitive fields before sending anything to external APIs.
- •Retain prompt/response logs according to compliance retention rules.
Common Pitfalls
- •
Letting the LLM make final compliance decisions
- •Avoid this by making deterministic rules authoritative.
- •The agent should recommend escalation; humans or rule engines should decide disposition.
- •
Passing raw transaction dumps into prompts
- •This leaks unnecessary PII and increases noise.
- •Send only normalized fields plus relevant enrichment values.
- •
Skipping audit trails
- •If you cannot explain why an alert fired, regulators will treat it as weak control design.
- •Persist inputs, tool outputs, model version, prompt text, response text, and reviewer actions.
Building this well is mostly about discipline. CrewAI gives you orchestration; banking-grade monitoring comes from tight control over tools, evidence handling, and auditability.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit