How to Build a transaction monitoring Agent Using LlamaIndex in Python for fintech

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringllamaindexpythonfintech

A transaction monitoring agent watches payment events, flags suspicious patterns, and explains why a transaction should be reviewed. In fintech, that matters because you need faster AML triage, better fraud detection, and an audit trail that compliance teams can defend.

Architecture

Build this agent as a small pipeline, not a single prompt.

  • Transaction ingestion layer

    • Pulls card payments, ACH, wire transfers, or wallet events from Kafka, S3, Postgres, or an API.
    • Normalizes fields like customer_id, merchant_id, amount, country, timestamp, and channel.
  • Risk feature store

    • Computes velocity checks, geo mismatch, device mismatch, structuring signals, and historical customer baselines.
    • Keeps deterministic rules separate from LLM reasoning.
  • LlamaIndex retrieval layer

    • Stores policy docs, AML typologies, internal playbooks, and previous SAR narratives.
    • Uses VectorStoreIndex plus a retriever to ground the agent in bank-specific policy.
  • Agent orchestration layer

    • Uses LlamaIndex agents/tools to combine rules, retrieval, and explanation generation.
    • Produces a decision like clear, review, or escalate.
  • Audit and case management layer

    • Persists every input, retrieved chunk, tool call, and final rationale.
    • Required for model governance and regulator review.

Implementation

1) Install dependencies and define your data model

Use LlamaIndex for retrieval plus a simple rule engine for hard controls. Keep the transaction schema explicit so the agent does not infer missing fields.

from dataclasses import dataclass
from typing import List

@dataclass
class Transaction:
    transaction_id: str
    customer_id: str
    merchant_id: str
    amount: float
    currency: str
    country: str
    channel: str
    timestamp: str
    risk_flags: List[str]

2) Index your compliance and typology documents

This is the part most teams skip. The agent needs grounded context from AML policies, sanctions escalation rules, and internal investigation playbooks.

from llama_index.core import VectorStoreIndex, Document

policy_docs = [
    Document(text="Escalate transactions over $10k with rapid repeat activity within 24 hours."),
    Document(text="Transactions involving sanctioned jurisdictions require enhanced due diligence."),
    Document(text="File SAR when activity is inconsistent with expected customer profile and cannot be explained."),
]

index = VectorStoreIndex.from_documents(policy_docs)
retriever = index.as_retriever(similarity_top_k=2)

3) Build a tool-driven monitoring function

For production fintech systems, keep deterministic checks outside the LLM. Then let the model explain the result using retrieved policy context.

from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

def rule_based_risk(tx: dict) -> dict:
    score = 0
    reasons = []

    if tx["amount"] >= 10000:
        score += 40
        reasons.append("High-value transaction")

    if tx["country"] in {"IR", "KP", "SY"}:
        score += 60
        reasons.append("High-risk jurisdiction")

    if "velocity" in tx.get("risk_flags", []):
        score += 25
        reasons.append("Rapid repeat activity")

    return {"score": min(score, 100), "reasons": reasons}

risk_tool = FunctionTool.from_defaults(fn=rule_based_risk)

llm = OpenAI(model="gpt-4o-mini", temperature=0)

agent = ReActAgent.from_tools(
    tools=[risk_tool],
    llm=llm,
    verbose=True,
)

4) Combine retrieval + reasoning into a review decision

The pattern here is: compute risk first, retrieve policy second, then ask the agent to produce an auditable recommendation. That keeps the LLM in an explanation role instead of letting it invent risk logic.

def monitor_transaction(tx: Transaction):
    tx_dict = tx.__dict__
    risk_result = rule_based_risk(tx_dict)

    query = (
        f"Transaction details: {tx_dict}\n"
        f"Rule-based risk result: {risk_result}\n"
        "Using the bank's AML policy context, decide whether this should be cleared,"
        " reviewed manually, or escalated. Return a concise rationale."
    )

    nodes = retriever.retrieve(query)
    context = "\n\n".join([node.node.get_content() for node in nodes])

    response = agent.chat(
        f"{query}\n\nRelevant policy context:\n{context}"
    )

    return {
        "transaction_id": tx.transaction_id,
        "risk_score": risk_result["score"],
        "reasons": risk_result["reasons"],
        "agent_decision": str(response),
        "policy_context_used": [node.node.get_content() for node in nodes],
    }

In practice you would wrap this in an API endpoint or stream processor. The important part is that every decision includes both machine rules and retrieved policy text.

Production Considerations

  • Separate inference from decisioning

    • Use the LLM for triage explanations and ambiguous cases.
    • Keep hard thresholds like sanctions blocks and country restrictions in deterministic code.
  • Log everything needed for audit

    • Store transaction payloads, rule outputs, retrieved chunks, prompt text, model version, and final decision.
    • Regulators care about reproducibility more than clever prompts.
  • Control data residency

    • If you process EU or UK customer data, keep embeddings and logs in-region.
    • Do not send raw PII to external endpoints unless your legal basis and vendor contracts allow it.
  • Add guardrails before production

    • Redact PANs, account numbers, names where possible.
    • Limit tool access so the agent can only read approved policy sources and case records.

Common Pitfalls

  1. Letting the LLM decide risk from scratch

    • Bad pattern: “Here is a transaction; tell me if it is suspicious.”
    • Fix: compute deterministic features first and use the model only to interpret them with policy context.
  2. Indexing noisy operational data without governance

    • If you dump tickets, chat logs, or analyst notes into one index without filtering access control tags, you will leak sensitive information across cases.
    • Fix: partition indexes by region, business unit, or clearance level.
  3. Ignoring evaluation on historical cases

    • A transaction monitoring agent is useless if it looks good on synthetic examples but misses real fraud patterns.
    • Fix: replay closed alerts through the pipeline and measure precision at top-k review queues plus false negative rates on confirmed SARs.

If you build this with strict rules outside the model and LlamaIndex handling grounded retrieval plus explanation generation inside controlled boundaries, you get something finance teams can actually operate. That is the difference between an AI demo and a monitoring system that survives compliance review.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides