How to Build a transaction monitoring Agent Using LlamaIndex in Python for investment banking

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringllamaindexpythoninvestment-banking

A transaction monitoring agent watches trade and payment activity, scores events against policy and historical patterns, and escalates suspicious cases with evidence. In investment banking, that matters because you need faster detection of market abuse, sanctions exposure, and unusual counterparty behavior without burying compliance teams in false positives.

Architecture

  • Transaction ingestion layer

    • Pulls trades, payments, SWIFT messages, and case notes from approved internal systems.
    • Normalizes records into a common schema with fields like counterparty, instrument, amount, jurisdiction, and timestamp.
  • Policy and controls store

    • Holds AML rules, surveillance policies, escalation thresholds, and desk-specific exceptions.
    • Should be versioned so every alert can be traced back to the exact rule set used.
  • LlamaIndex retrieval layer

    • Indexes policy documents, prior cases, typologies, SAR guidance, and control procedures.
    • Uses retrieval to ground the agent in firm-approved context instead of raw model memory.
  • Reasoning and scoring agent

    • Combines retrieved policy context with transaction facts.
    • Produces a risk assessment, rationale, and recommended action: clear, review, or escalate.
  • Audit trail store

    • Persists every input, retrieved chunk, score, response, and human override.
    • This is non-negotiable for model governance and regulator review.
  • Case management integration

    • Pushes high-risk events into the bank’s workflow system for analyst review.
    • Keeps the agent advisory; the final disposition stays with compliance or surveillance staff.

Implementation

1. Build a small policy index with LlamaIndex

For production you would load policies from your document store. For this pattern, use Document, VectorStoreIndex, and a local embedding model so retrieval is deterministic enough for testing.

from llama_index.core import Document, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.settings import Settings

# Use a local embedding model to keep sensitive policy data on-prem
Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

docs = [
    Document(
        text=(
            "Escalate transactions over $5M involving high-risk jurisdictions "
            "or counterparties on sanctions watchlists. Review unusual round-trip "
            "trading patterns within 24 hours."
        ),
        metadata={"source": "aml_policy_v3", "version": "3.2"}
    ),
    Document(
        text=(
            "Any payment involving nested intermediaries or missing beneficiary "
            "information requires enhanced due diligence and compliance review."
        ),
        metadata={"source": "payments_control", "version": "1.8"}
    ),
]

index = VectorStoreIndex.from_documents(docs)
retriever = index.as_retriever(similarity_top_k=2)

2. Define a transaction scoring function that grounds itself in retrieved controls

The agent should not “freewheel” on a prompt alone. Retrieve policy context first, then pass both the transaction payload and the supporting evidence into an LLM-backed query engine.

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

def score_transaction(txn: dict) -> str:
    query = (
        f"Transaction details:\n"
        f"- amount: {txn['amount']}\n"
        f"- currency: {txn['currency']}\n"
        f"- counterparty: {txn['counterparty']}\n"
        f"- jurisdiction: {txn['jurisdiction']}\n"
        f"- product: {txn['product']}\n"
        f"- notes: {txn.get('notes', '')}\n\n"
        f"Assess whether this should be cleared, reviewed, or escalated."
    )

    nodes = retriever.retrieve(query)
    context = "\n\n".join([node.node.text for node in nodes])

    prompt = (
        "You are a transaction monitoring analyst for an investment bank.\n"
        "Use only the provided policy context.\n\n"
        f"Policy context:\n{context}\n\n"
        f"{query}\n\n"
        "Return:\n"
        "- decision: clear | review | escalate\n"
        "- rationale: short explanation\n"
        "- evidence: bullet list of policy references used"
    )

    response = Settings.llm.complete(prompt)
    return str(response)

txn = {
    "amount": "$7.4M",
    "currency": "USD",
    "counterparty": "ABC Holdings Ltd",
    "jurisdiction": "UAE",
    "product": "equity swap",
    "notes": "Back-to-back booking across two desks."
}

print(score_transaction(txn))

3. Add structured output for downstream case management

In banking workflows you want machine-readable results. Use PydanticProgramExtractor or direct structured parsing patterns so the alert can flow into your case system without brittle regex handling.

from pydantic import BaseModel
from typing import Literal
from llama_index.core.output_parsers import PydanticOutputParser

class MonitoringResult(BaseModel):
    decision: Literal["clear", "review", "escalate"]
    rationale: str
    evidence: list[str]

parser = PydanticOutputParser(output_cls=MonitoringResult)

def score_transaction_structured(txn: dict) -> MonitoringResult:
    nodes = retriever.retrieve(str(txn))
    context = "\n".join(node.node.text for node in nodes)

    prompt = (
        f"Policy context:\n{context}\n\n"
        f"Transaction:\n{txn}\n\n"
        f"{parser.format()}"
    )

    raw = Settings.llm.complete(prompt)
    return parser.parse(str(raw))

result = score_transaction_structured(txn)
print(result.model_dump())

4. Persist audit records before you ship alerts

Every decision needs traceability: input payload hash, retrieved sources, model version, timestamp, and analyst override if one exists. Store that outside the model path so your audit log survives model changes.

Production Considerations

  • Data residency

    • Keep embeddings, vector stores, and logs inside approved regions.
    • If your bank requires it, run local models or private endpoints; do not send raw trade data to public SaaS by default.
  • Auditability

    • Log the exact prompt template version, retrieved document IDs, confidence score or decision label, and human disposition.
    • Regulators care about reproducibility more than clever prompts.
  • Guardrails

    • Hard-code escalation rules for sanctions hits, politically exposed persons, threshold breaches, and restricted products.
    • The agent should recommend; it should not auto-close suspicious activity without explicit policy support.
  • Monitoring

    • Track false positive rate by desk, region, product type, and counterparty segment.
    • Watch retrieval quality too; if the wrong control docs are being surfaced, your alert quality will degrade fast.

Common Pitfalls

  • Using the LLM as the primary detector

    • Don’t ask the model to infer risk from scratch.
    • Start with deterministic rules and use LlamaIndex to ground explanations and triage decisions against approved policies.
  • Skipping version control on policies

    • If policies change weekly but your index does not track versions, you cannot defend an alert decision later.
    • Store document metadata like source, version, effective_date, and reindex on change.
  • Leaking sensitive data into prompts

    • Transaction monitoring often includes client names, account numbers, and restricted deal information.
    • Redact where possible before prompting; keep raw identifiers in internal systems tied to audit logs rather than in long-lived model traces.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides