How to Build a transaction monitoring Agent Using LlamaIndex in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringllamaindexpythonretail-banking

A transaction monitoring agent for retail banking watches account activity, scores suspicious patterns, and routes cases that need human review. It matters because banks need to catch fraud and AML risks early without drowning analysts in false positives, while still preserving auditability, explainability, and regulatory controls.

Architecture

  • Transaction ingestion layer

    • Pulls card swipes, ACH transfers, wire events, cash deposits, and login signals from your core banking or event stream.
    • Normalizes them into a consistent schema before any LLM or retrieval step touches the data.
  • Risk rules and feature extractor

    • Computes deterministic signals like velocity spikes, round-dollar behavior, new payee usage, geo-distance anomalies, and structuring patterns.
    • These features should be available to both the agent and your downstream case management system.
  • LlamaIndex retrieval layer

    • Stores policy documents, KYC notes, historical SAR narratives, typology guidance, and internal AML procedures.
    • Uses VectorStoreIndex plus a retriever so the agent can ground decisions in bank policy instead of free-form guessing.
  • Decision orchestration layer

    • Uses an LLM-backed QueryEngine or ReActAgent to combine transaction facts with retrieved policy context.
    • Produces a structured output: risk score, reason codes, recommended action, and escalation path.
  • Audit and case export layer

    • Persists every input, retrieval result, prompt, model output, and final decision.
    • This is non-negotiable for retail banking because compliance teams need a defensible trail.

Implementation

1) Install dependencies and define the transaction schema

Start with a minimal stack: LlamaIndex for retrieval/orchestration and Pydantic for typed transaction payloads. Keep the transaction object strict so downstream logic does not depend on messy dictionaries.

from pydantic import BaseModel, Field
from datetime import datetime
from typing import List

class Transaction(BaseModel):
    transaction_id: str
    customer_id: str
    amount: float = Field(gt=0)
    currency: str
    channel: str  # card, ach, wire, cash
    timestamp: datetime
    merchant_name: str | None = None
    country: str | None = None
    risk_features: List[str] = []

2) Load policy documents into a LlamaIndex vector index

This is where you ground the agent in bank-specific policy. In practice these documents come from AML procedures, fraud playbooks, and escalation SOPs stored in an approved internal repository.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

docs = SimpleDirectoryReader("./bank_policies").load_data()
index = VectorStoreIndex.from_documents(docs)

retriever = index.as_retriever(similarity_top_k=3)

Use small chunks of authoritative content here. For retail banking workloads, you want policy retrieval to be precise enough that the model can cite why it flagged a transaction.

3) Build the monitoring query engine

The pattern below combines raw transaction facts with retrieved policy context. The output should be structured enough for a case workflow system to consume without post-processing hacks.

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import RetrieverQueryEngine

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
)

def build_prompt(txn: Transaction) -> str:
    return f"""
You are a retail banking transaction monitoring analyst.
Assess this transaction against AML/fraud policy.

Transaction:
- id: {txn.transaction_id}
- customer_id: {txn.customer_id}
- amount: {txn.amount} {txn.currency}
- channel: {txn.channel}
- timestamp: {txn.timestamp.isoformat()}
- merchant_name: {txn.merchant_name}
- country: {txn.country}
- risk_features: {", ".join(txn.risk_features)}

Return:
1. risk_level (low/medium/high)
2. reason_codes (list)
3. recommended_action (monitor/review/escalate)
4. short_audit_summary
""".strip()

def assess_transaction(txn: Transaction):
    prompt = build_prompt(txn)
    response = query_engine.query(prompt)
    return response.response

This uses VectorStoreIndex, as_retriever(), and RetrieverQueryEngine.from_args(), which are actual LlamaIndex APIs. In production you would usually wrap the response in a Pydantic model or parse it into JSON before sending it to your case management tool.

4) Add an agent wrapper for analyst workflows

If you want the system to answer follow-up questions like “why was this escalated?” or “show similar cases,” use an agent on top of retrieval. That gives investigators a conversational interface without losing access to grounded evidence.

from llama_index.core.agent import ReActAgent

tools = [
    query_engine.as_tool(
        name="transaction_policy_lookup",
        description="Look up AML/fraud policy context for retail banking transactions."
    )
]

agent = ReActAgent.from_tools(
    tools=tools,
    llm=Settings.llm,
    verbose=True,
)

result = agent.chat("Review a $9,850 cash deposit followed by three outgoing ACH transfers.")
print(result)

For banks, I prefer this split:

  • deterministic feature extraction outside the model
  • retrieval over approved internal content inside LlamaIndex
  • final disposition stored as structured JSON in your case system

Production Considerations

  • Data residency

    • Keep customer data and embeddings inside the required jurisdiction.
    • If your bank operates across regions, separate indexes by region so EU data does not leak into US-hosted infrastructure.
  • Auditability

    • Persist the full chain: input transaction features, retrieved document IDs, prompt text, model version, output decision.
    • This is what lets compliance teams reconstruct why a case was escalated six months later.
  • Guardrails

    • Never let the model make unsupervised account actions.
    • Restrict outputs to recommend/monitor/escalate; actual holds or closures should require explicit rule-based approval or human sign-off.
  • Monitoring

    • Track false positive rate by segment: new-to-bank customers, high-net-worth accounts, cash-heavy merchants.
    • Also track retrieval quality; if the agent starts citing irrelevant policies, your decisions will drift fast.

Common Pitfalls

  1. Using raw LLM output as the final decision

    • Don’t do this.
    • Parse into a strict schema and keep deterministic thresholds outside the model for high-risk actions like SAR referral or account restriction.
  2. Stuffing too much into one prompt

    • Long prompts with full customer history tend to degrade precision.
    • Extract only relevant features first, then retrieve only the top policy snippets needed for that case.
  3. Ignoring compliance controls around embeddings and logs

    • Embeddings can still contain sensitive financial information.
    • Apply retention policies, encryption at rest, access controls, and regional isolation just like you would for core banking records.

A good retail banking monitoring agent is not “an LLM that flags fraud.” It is a controlled workflow that combines rules, retrieval over approved policy content, structured outputs, and audit logging. Build it that way and you get something compliance can review instead of something they have to shut down.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides