How to Build a transaction monitoring Agent Using AutoGen in Python for payments
A transaction monitoring agent watches payment activity, scores risk, and escalates suspicious transfers for review before money moves or gets settled. For payments teams, this matters because false negatives become fraud losses and false positives become customer friction, ops load, and compliance noise.
Architecture
- •
Transaction event source
- •Pulls payment events from Kafka, SQS, a webhook, or a database stream.
- •Normalizes fields like
amount,currency,merchant_id,customer_id,country, anddevice_id.
- •
Risk scoring agent
- •Uses rules plus an LLM-backed analyst to classify the transaction.
- •Outputs a structured decision like
approve,review, orblock.
- •
Policy / compliance layer
- •Enforces hard rules for sanctions, velocity limits, KYC status, and geography restrictions.
- •Keeps the LLM from making final decisions outside policy.
- •
Audit logger
- •Writes every input, model output, tool call, and final decision to immutable storage.
- •Needed for PCI-adjacent controls, internal audit, and regulator review.
- •
Case management handoff
- •Creates a queue item in your ops system when the agent flags a transaction.
- •Includes evidence: rule hits, risk factors, and explanation.
- •
Human review loop
- •Lets analysts override or confirm decisions.
- •Feeds labeled outcomes back into tuning and evaluation.
Implementation
1) Define the transaction schema and policy checks
Start with deterministic checks before you involve an LLM. In payments, the agent should never “reason” its way around a blocked country or sanctions hit.
from dataclasses import dataclass
from typing import Literal
Decision = Literal["approve", "review", "block"]
@dataclass
class Transaction:
tx_id: str
customer_id: str
merchant_id: str
amount: float
currency: str
country: str
device_id: str
kyc_status: str
velocity_24h: int
BLOCKED_COUNTRIES = {"IR", "KP", "SY"}
HIGH_RISK_KYC = {"pending", "failed"}
def policy_check(tx: Transaction) -> tuple[Decision | None, list[str]]:
reasons = []
if tx.country in BLOCKED_COUNTRIES:
reasons.append(f"blocked_country:{tx.country}")
if tx.kyc_status in HIGH_RISK_KYC:
reasons.append(f"kyc_status:{tx.kyc_status}")
if tx.amount > 10000:
reasons.append("high_amount")
if tx.velocity_24h > 20:
reasons.append("high_velocity")
if any(r.startswith("blocked_country") or r.startswith("kyc_status:failed") for r in reasons):
return "block", reasons
if reasons:
return "review", reasons
return None, reasons
2) Build an AutoGen assistant that explains risk in structured form
Use AutoGen’s AssistantAgent for analysis and keep the output constrained. The agent should produce JSON so your payment service can parse it reliably.
import os
import json
from autogen import AssistantAgent
llm_config = {
"config_list": [
{
"model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
"api_key": os.environ["OPENAI_API_KEY"],
}
],
"temperature": 0,
}
risk_agent = AssistantAgent(
name="risk_agent",
llm_config=llm_config,
)
def analyze_transaction(tx: Transaction, policy_reasons: list[str]) -> dict:
prompt = f"""
You are a payment transaction monitoring analyst.
Return ONLY valid JSON with keys:
decision: one of ["approve","review","block"]
reasons: array of short strings
risk_score: integer from 0 to 100
explanation: one short paragraph
Transaction:
{json.dumps(tx.__dict__, indent=2)}
Policy signals:
{json.dumps(policy_reasons)}
"""
response = risk_agent.generate_reply(messages=[{"role": "user", "content": prompt}])
content = response if isinstance(response, str) else response.get("content", "")
return json.loads(content)
3) Orchestrate deterministic policy plus LLM judgment
The production pattern is simple: hard rules first, LLM second only when needed. That keeps compliance logic explicit and reduces bad model behavior.
def monitor_transaction(tx: Transaction) -> dict:
decision, policy_reasons = policy_check(tx)
if decision == "block":
return {
"tx_id": tx.tx_id,
"decision": decision,
"reasons": policy_reasons,
"source": "policy",
}
llm_result = analyze_transaction(tx, policy_reasons)
final_decision = llm_result["decision"]
if decision == "review" and final_decision == "approve":
final_decision = "review"
return {
"tx_id": tx.tx_id,
"decision": final_decision,
"risk_score": llm_result["risk_score"],
"reasons": list(set(policy_reasons + llm_result["reasons"])),
"explanation": llm_result["explanation"],
"source": "policy+llm",
}
if __name__ == "__main__":
tx = Transaction(
tx_id="tx_10001",
customer_id="cus_77",
merchant_id="m_12",
amount=1250.00,
currency="USD",
country="US",
device_id="dev_9",
kyc_status="verified",
velocity_24h=3,
)
result = monitor_transaction(tx)
print(json.dumps(result, indent=2))
4) Add an optional human review loop with AutoGen GroupChat
When you need analyst confirmation on edge cases, use GroupChat with a reviewer agent. This is useful for case triage workflows where the model drafts the case summary and a human approves it.
from autogen import UserProxyAgent, GroupChat, GroupChatManager
user_proxy = UserProxyAgent(
name="analyst",
human_input_mode="NEVER",
)
review_agent = AssistantAgent(
name="review_agent",
llm_config=llm_config,
)
groupchat = GroupChat(
agents=[user_proxy, risk_agent, review_agent],
messages=[],
)
manager = GroupChatManager(groupchat=groupchat)
# manager.run_chat(...) is typically used in interactive workflows;
# in production you'd wrap this behind your case management service.
Production Considerations
- •
Keep data residency explicit
- •Route EU payment data to EU-hosted inference endpoints.
- •Don’t send PANs or sensitive auth data to the model; tokenize or redact before prompting.
- •
Log everything needed for audit
- •Store transaction input hashes, policy hits, model version, prompt version, and final decision.
- •Make logs append-only so compliance can reconstruct why a payment was held or blocked.
- •
Put guardrails before the model
- •Sanctions screening, KYC state, velocity limits, and country restrictions must be deterministic.
- •The agent can recommend review; it should not override mandatory blocks.
- •
Monitor drift by segment
- •Track approval rate, manual review rate, chargeback rate, false positive rate by merchant vertical and region.
- •Payments fraud patterns change fast; segment-level metrics matter more than global averages.
Common Pitfalls
- •
Letting the LLM make final compliance decisions
- •Fix it by enforcing hard blocks in code before any model call.
- •Use the LLM only for explanation and borderline risk assessment.
- •
Sending raw sensitive payment data into prompts
- •Fix it by redacting PANs, CVVs never entering the system at all, and masking PII where possible.
- •Use tokens or internal IDs instead of full customer records.
- •
Accepting free-form text outputs
- •Fix it by forcing JSON output and parsing it immediately with
json.loads. - •If parsing fails, fall back to
reviewrather than guessing.
- •Fix it by forcing JSON output and parsing it immediately with
- •
Ignoring auditability
- •Fix it by storing prompt versioning, model name, decision path, and reviewer overrides.
- •In payments investigations you need reproducibility more than cleverness.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit