How to Build a transaction monitoring Agent Using LlamaIndex in Python for wealth management
A transaction monitoring agent for wealth management watches client activity, flags suspicious or policy-breaking patterns, and routes only the right cases to humans. It matters because wealth firms need to catch AML issues, unusual transfers, concentration risk, and account behavior changes without drowning compliance teams in false positives.
Architecture
- •
Transaction ingestion layer
- •Pulls trades, cash movements, transfers, and account events from core systems or a warehouse.
- •Normalizes records into a consistent schema before they hit the agent.
- •
Policy and rules store
- •Holds firm-specific thresholds like velocity limits, high-risk jurisdictions, sanctioned counterparties, and unusual asset movement rules.
- •Keeps deterministic checks separate from LLM reasoning.
- •
LlamaIndex retrieval layer
- •Uses
VectorStoreIndexover policies, procedures, escalation playbooks, and historical case notes. - •Gives the agent context on what “suspicious” means in your firm.
- •Uses
- •
Reasoning and orchestration layer
- •Uses an LLM-backed
QueryEngineorAgentRunnerto classify alerts and explain why a transaction was flagged. - •Produces structured outputs for downstream case management.
- •Uses an LLM-backed
- •
Audit and case management layer
- •Stores every input, retrieved policy snippet, model response, and final decision.
- •Required for compliance review, model governance, and examiner requests.
Implementation
1) Build a small knowledge base for policies and escalation notes
Start with firm documents that define acceptable behavior. In production you would load PDFs, SharePoint exports, or internal docs; here we use plain text documents so the pattern is clear.
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
Settings.llm = OpenAI(model="gpt-4o-mini")
docs = [
Document(text="""
Wealth management monitoring policy:
- Flag outbound wires above $250k if beneficiary is new or high risk.
- Flag rapid movement of funds in and out within 5 business days.
- Escalate transactions involving sanctioned jurisdictions or PEP-linked entities.
- Review sudden concentration shifts greater than 30% in a single sector.
"""),
Document(text="""
Escalation guidance:
- Low confidence alerts go to analyst review.
- High-risk jurisdiction + large transfer requires same-day compliance review.
- If client profile conflicts with activity pattern, attach KYC summary.
""")
]
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=2)
This gives the agent retrieval over policy text instead of hardcoding every rule into prompts. That matters because wealth firms change thresholds often.
2) Create a transaction record and ask the engine for a risk assessment
You want the model to explain its reasoning against your internal policy. Keep the transaction payload structured so you can log it cleanly later.
transaction = """
Client: ACME Family Trust
Amount: $420000
Type: Outbound wire
Beneficiary status: New beneficiary
Jurisdiction: Cayman Islands
Timing: Two inbound deposits this week followed by immediate wire out
Notes: Client profile is long-term conservative income investor
"""
response = query_engine.query(
f"""
Assess this transaction against the firm's monitoring policy.
Return:
1. Risk level: low/medium/high
2. Why it was flagged
3. What evidence from policy supports the decision
4. Recommended next action
Transaction:
{transaction}
"""
)
print(response)
In a real workflow, this output becomes an analyst assist artifact, not an automatic approval. For wealth management you usually want human-in-the-loop review for anything that touches AML or suitability concerns.
3) Wrap the retrieval result into an auditable case object
The key production pattern is not just classification; it is traceability. You need to persist what was asked, what sources were retrieved, and what answer was returned.
from dataclasses import dataclass, asdict
from datetime import datetime
import json
@dataclass
class MonitoringCase:
case_id: str
created_at: str
transaction_text: str
model_answer: str
case = MonitoringCase(
case_id="TM-2026-00017",
created_at=datetime.utcnow().isoformat(),
transaction_text=transaction,
model_answer=str(response),
)
with open("audit_case_tm_2026_00017.json", "w") as f:
json.dump(asdict(case), f, indent=2)
That file should eventually land in immutable storage with retention controls aligned to your regulatory obligations. If your firm operates across regions, keep residency constraints in mind before shipping data to external model endpoints.
4) Add deterministic pre-checks before LlamaIndex reasoning
Do not send every event directly to the LLM. Use simple rules first so you reduce cost and keep obvious violations out of ambiguous reasoning paths.
def precheck(tx):
red_flags = []
if tx["amount"] >= 250000:
red_flags.append("large_wire")
if tx["jurisdiction"] in {"Cayman Islands", "North Korea", "Iran"}:
red_flags.append("high_risk_jurisdiction")
if tx["new_beneficiary"]:
red_flags.append("new_beneficiary")
return red_flags
tx = {
"amount": 420000,
"jurisdiction": "Cayman Islands",
"new_beneficiary": True,
}
flags = precheck(tx)
if flags:
print(f"Escalate immediately: {flags}")
else:
print(query_engine.query("Assess transaction risk..."))
This split between rules and LLM reasoning is important in wealth management. Compliance teams expect deterministic controls for known risks and explanatory analysis for gray areas.
Production Considerations
- •
Deployment
- •Run the monitoring service as an internal API behind SSO and network controls.
- •Separate ingestion workers from analyst-facing query services so spikes in transaction volume do not block reviews.
- •
Monitoring
- •Track alert volume, false positive rate, average time-to-triage, retrieval hit rate, and analyst override rate.
- •Log every prompt version and index version so you can reproduce decisions during audits.
- •
Guardrails
- •Force structured outputs for risk level, rationale, evidence cited, and recommended action.
- •Block direct release decisions from the model; it should recommend escalation or review only.
- •
Wealth management controls
- •Enforce data residency for client records where required by jurisdiction.
- •Mask sensitive identifiers like account numbers unless they are needed for case handling.
- •Keep a full audit trail for KYC/AML examiners and internal model risk teams.
Common Pitfalls
- •
Using the LLM as the first line of defense
- •Don’t send raw transactions straight into prompting without prechecks.
- •Fix it by applying deterministic rules first for threshold breaches and sanctions hits.
- •
No source grounding
- •If the agent cannot cite policy text or prior escalation guidance, analysts will not trust it.
- •Fix it by indexing internal policies with
VectorStoreIndexand retrieving top-k supporting snippets every time.
- •
Weak auditability
- •Storing only the final answer is not enough for regulated workflows.
- •Fix it by persisting input payloads, retrieved context, model output, timestamps, prompt versioning, and reviewer actions.
- •
Ignoring regional data constraints
- •Wealth firms often have strict residency rules for client data and cross-border processing limits.
- •Fix it by deploying regionally scoped indexes and keeping sensitive records inside approved infrastructure.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit