How to Build a fraud detection Agent Using LlamaIndex in Python for banking
A fraud detection agent in banking does two things well: it pulls together transaction data, customer context, and policy rules, then turns that into an explainable decision or escalation path. That matters because fraud teams need speed without losing auditability, and compliance teams need to see why a transaction was flagged, not just that a model said so.
Architecture
- •
Data ingestion layer
- •Pulls transactions, account metadata, device signals, and case history from approved bank systems.
- •In practice this is usually a mix of SQL, REST APIs, and event streams.
- •
Retrieval layer
- •Uses LlamaIndex
VectorStoreIndexfor historical fraud cases, internal playbooks, SAR guidance, and policy docs. - •Keeps the agent grounded in bank-specific knowledge instead of generic LLM guesses.
- •Uses LlamaIndex
- •
Risk scoring layer
- •Combines deterministic rules with LLM-based reasoning.
- •Example: velocity checks, geo mismatch, new payee risk, and prior case similarity.
- •
Decision layer
- •Produces one of three outputs: approve, step-up verify, or escalate to analyst.
- •This should be explicit and machine-readable.
- •
Audit layer
- •Stores the prompt context, retrieved documents, model output, and final action.
- •Required for compliance review and incident investigation.
- •
Guardrail layer
- •Prevents the agent from making unsupported claims or exposing sensitive customer data.
- •Also enforces data residency and least-privilege access.
Implementation
1) Install dependencies and load bank-approved documents
For this pattern you want a small retrieval corpus: fraud playbooks, AML escalation rules, typologies, and analyst notes that are safe to use. Keep customer PII out of the index unless your legal and security teams have explicitly approved it.
pip install llama-index openai python-dotenv
import os
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
load_dotenv()
docs = SimpleDirectoryReader("./bank_fraud_docs").load_data()
llm = OpenAI(model="gpt-4o-mini", temperature=0)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=3)
This gives you a retrieval-backed query engine over internal fraud documentation. In banking, that is the difference between “the model thinks” and “the model cites the policy we actually use.”
2) Build a structured transaction risk assessment prompt
Do not ask the model for a free-form opinion. Force a structured assessment with clear decision categories so downstream systems can act on it.
from pydantic import BaseModel
from typing import Literal
class FraudAssessment(BaseModel):
decision: Literal["approve", "step_up_verify", "escalate"]
rationale: str
indicators: list[str]
def assess_transaction(txn: dict):
prompt = f"""
You are a fraud analyst for a retail bank.
Assess this transaction using only the provided context.
Transaction:
{txn}
Return:
- decision: approve | step_up_verify | escalate
- rationale: short explanation
- indicators: list of concrete fraud signals
"""
response = query_engine.query(prompt)
return response.response
txn = {
"amount": 9800,
"currency": "USD",
"country": "NG",
"customer_country": "US",
"new_payee": True,
"device_change": True,
}
print(assess_transaction(txn))
This is still text output at this stage. In production you would parse it into a schema like FraudAssessment before routing to an analyst queue or case management system.
3) Add tool-based enrichment for live banking signals
LlamaIndex agents work best when they can call tools. For fraud detection that usually means account status checks, recent login history, card controls, and sanctions screening results.
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
def get_recent_activity(account_id: str) -> str:
# Replace with real internal API call
return f"Account {account_id}: 4 failed logins in last hour; new device seen; card present flag off."
activity_tool = FunctionTool.from_defaults(
fn=get_recent_activity,
name="get_recent_activity",
description="Fetch recent account activity for fraud analysis."
)
agent = ReActAgent.from_tools(
[activity_tool],
llm=llm,
verbose=True,
)
result = agent.chat(
"Analyze account A123 for fraud risk. Use recent activity and bank policy docs."
)
print(result)
The key pattern here is separation of concerns:
- •tools fetch factual signals,
- •retrieval provides policy context,
- •the LLM synthesizes both into an explainable recommendation.
4) Route decisions into a case workflow
A fraud agent should not directly block customers unless your controls allow it. A safer design is to emit a decision object that your orchestration layer consumes.
def route_decision(decision_text: str):
if "escalate" in decision_text.lower():
return {"action": "create_case", "priority": "high"}
if "step_up_verify" in decision_text.lower():
return {"action": "trigger_otp", "priority": "medium"}
return {"action": "allow", "priority": "low"}
decision_text = result.response if hasattr(result, "response") else str(result)
print(route_decision(decision_text))
That keeps the agent inside a controlled banking workflow. The agent recommends; your policy engine executes.
Production Considerations
- •
Deploy in-region
- •Keep model inference, vector stores, logs, and backups inside approved regions.
- •Banking data residency requirements are not optional; they shape your whole architecture.
- •
Log everything needed for audit
- •Store retrieved document IDs, prompt version, tool calls, model output, final action, and human override.
- •If compliance asks why an account was escalated six months later, you need reproducible evidence.
- •
Add hard guardrails
- •Block PII leakage in prompts and outputs.
- •Redact account numbers before sending text to the LLM.
- •Use allowlisted tools only; never let the agent call arbitrary internal endpoints.
- •
Monitor drift and false positives
- •Track precision by segment: geography, channel, merchant category, customer tenure.
- •Fraud patterns change fast; if your alert rate spikes after a rollout, roll back immediately.
Common Pitfalls
- •
Using raw customer data in retrieval
- •Mistake: indexing full statements or KYC files without controls.
- •Fix: index only approved summaries or masked fields unless security has signed off on full-data handling.
- •
Letting the LLM make final blocking decisions
- •Mistake: “model says fraudulent” becomes an automatic decline.
- •Fix: route outputs through deterministic policy rules or analyst review for high-impact actions.
- •
Skipping explainability
- •Mistake: storing only the final label with no evidence trail.
- •Fix: persist retrieved chunks, tool outputs, prompt template version, and the exact reason codes used for escalation.
- •
Ignoring latency budgets
- •Mistake: calling too many tools or retrieving too much context during authorization flow.
- •Fix: keep real-time scoring lean; push deeper investigation to asynchronous case review after step-up verification or hold placement.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit