How to Build a transaction monitoring Agent Using LlamaIndex in Python for payments

By Cyprian AaronsUpdated 2026-04-21

transaction-monitoringllamaindexpythonpayments

A transaction monitoring agent for payments watches incoming payment events, enriches them with policy and customer context, and flags patterns that look suspicious, non-compliant, or operationally risky. In practice, it helps teams catch structuring, velocity abuse, mule activity, duplicate payments, and policy breaches before they become chargebacks, losses, or regulatory problems.

Architecture

•
Event ingestion layer
- •Pulls transactions from Kafka, SQS, webhooks, or batch files.
- •Normalizes fields like transaction_id, customer_id, amount, currency, merchant_id, country, and timestamp.
•
Policy and controls corpus
- •Stores AML rules, KYC policies, sanctions guidance, payment network rules, and internal SOPs.
- •Indexed with LlamaIndex so the agent can retrieve the right control text during triage.
•
Transaction context store
- •Holds customer profile data, historical velocity metrics, device fingerprints, chargeback history, and case outcomes.
- •Exposed to the agent as structured tools.
•
Retrieval and reasoning layer
- •Uses LlamaIndex retrievers to pull relevant policy snippets and similar historical cases.
- •Combines retrieval with an LLM to produce a risk assessment and recommended action.
•
Decisioning and case output
- •Produces a structured result: risk score band, reason codes, evidence citations, and next action.
- •Sends output to a case management system or human review queue.
•
Audit logging
- •Persists every retrieved source chunk, model response, tool call, and final decision.
- •Required for compliance review and post-incident analysis.

Implementation

1. Install dependencies and load your policy corpus

Use LlamaIndex to index your payment controls. For production systems, keep the corpus in a controlled document store and version it by policy date.

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Load policy documents: AML rules, chargeback SOPs, fraud playbooks
documents = SimpleDirectoryReader("./payment_policies").load_data()

# Build the index used by the monitoring agent
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model="text-embedding-3-small"),
)

# Query engine for policy retrieval
query_engine = index.as_query_engine(
    llm=OpenAI(model="gpt-4o-mini", temperature=0),
    similarity_top_k=3,
)

2. Define the transaction payload and a risk scoring tool

Keep the transaction event structured. Don’t pass raw logs into the model when you can pass typed fields.

from pydantic import BaseModel
from typing import Literal

class TransactionEvent(BaseModel):
    transaction_id: str
    customer_id: str
    amount: float
    currency: str
    country: str
    merchant_category: str
    channel: Literal["card", "bank_transfer", "wallet"]
    is_first_party: bool

def score_transaction(event: TransactionEvent) -> dict:
    score = 0

    if event.amount >= 5000:
        score += 25
    if event.country not in {"US", "GB", "DE", "FR"}:
        score += 15
    if event.merchant_category in {"crypto", "gaming", "gift_cards"}:
        score += 20
    if event.channel == "bank_transfer" and event.is_first_party is False:
        score += 10

    return {
        "risk_score": min(score, 100),
        "band": "high" if score >= 50 else "medium" if score >= 25 else "low",
    }

3. Build an agent that retrieves policy evidence before deciding

This pattern matters because payments teams need explainability. The agent should cite control text when it recommends escalation.

from llama_index.core.tools import FunctionTool
from llama_index.core.agent.workflow import FunctionAgent

score_tool = FunctionTool.from_defaults(fn=score_transaction)

def retrieve_policy_context(query: str) -> str:
    response = query_engine.query(query)
    return str(response)

policy_tool = FunctionTool.from_defaults(fn=retrieve_policy_context)

agent = FunctionAgent(
    tools=[score_tool, policy_tool],
    llm=OpenAI(model="gpt-4o-mini", temperature=0),
    system_prompt=(
        "You are a payment transaction monitoring agent. "
        "Assess the transaction using policy evidence. "
        "Return a concise decision with reason codes and next action."
    ),
)

event = TransactionEvent(
    transaction_id="txn_123",
    customer_id="cus_456",
    amount=7800,
    currency="USD",
    country="NG",
    merchant_category="gift_cards",
    channel="card",
    is_first_party=True,
)

result = agent.run(
    f"""
Transaction:
{event.model_dump()}

Tasks:
1) Score this transaction.
2) Retrieve relevant payment policy guidance.
3) Recommend allow, review, or block.
"""
)

print(result)

4. Add audit output for compliance teams

Every decision needs traceability. Store the input event, retrieved sources, model response, and final outcome in your case database or object store.

import json
from datetime import datetime

def write_audit_record(event: TransactionEvent, decision_text: str):
    record = {
        "timestamp": datetime.utcnow().isoformat(),
        "transaction": event.model_dump(),
        "decision": decision_text,
        "model": "gpt-4o-mini",
        "policy_version": "2026-04",
        "region": "eu-west-1",
    }
    with open(f"./audit/{event.transaction_id}.json", "w") as f:
        json.dump(record, f, indent=2)

write_audit_record(event, str(result))

Production Considerations

•
Deploy close to your data boundary
- •Payment data often has residency constraints. Keep EU transactions in EU-hosted infrastructure and avoid sending sensitive fields across regions.
•
Log everything needed for audit
- •Persist retrieved policy chunks with document IDs and timestamps.
- •Keep model prompts minimal but reproducible so reviewers can reconstruct why a case was escalated.
•
Add guardrails around outputs
- •Force structured decisions like allow, review, or block.
- •Never let the agent directly move funds or freeze accounts without deterministic rule checks plus human approval.
•
Monitor drift in both behavior and fraud patterns
- •Track false positives by merchant segment, country corridor, channel type, and time of day.
- •Recalibrate thresholds when new scam patterns or regulatory changes appear.

Common Pitfalls

•
Feeding raw PII into prompts
- •Don’t dump full cardholder profiles into the LLM.
- •Redact PANs, account numbers, addresses where possible. Pass only what’s needed for the decision.
•
Using retrieval without version control
- •If your policy corpus changes weekly but you don’t version documents, you can’t explain why a transaction was blocked last month.
- •Store policy version IDs alongside each decision record.
•
Letting the model make unbounded decisions
- •A transaction monitoring agent should recommend; your rules engine should decide on hard blocks for sanctions hits or confirmed fraud patterns.
- •Use deterministic checks first for high-severity controls like sanctions screening and legal holds.
•
Ignoring regional compliance requirements
- •Payments systems have to respect AML obligations, retention rules, PCI scope boundaries, and local data residency laws.
- •Design the agent around those constraints from day one instead of bolting them on after launch.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit