How to Build a fraud detection Agent Using LlamaIndex in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21

fraud-detectionllamaindexpythonretail-banking

A fraud detection agent in retail banking watches transaction streams, customer profiles, and case history, then flags suspicious activity and explains why a transaction looks risky. It matters because you need fast decisions, traceable reasoning, and human review for anything that could impact a customer’s account or trigger a regulatory issue.

Architecture

•
Transaction ingestion layer
- •Pulls card payments, ACH transfers, wire activity, login events, and device metadata from your operational systems.
•
Feature retrieval layer
- •Uses LlamaIndex to query recent customer behavior, historical fraud cases, policy docs, and playbooks.
•
Risk scoring agent
- •Wraps an LLM with tools so it can classify patterns, summarize evidence, and recommend escalation.
•
Case explanation layer
- •Produces an audit-friendly rationale: what changed, what matched known fraud patterns, and what evidence was used.
•
Human review workflow
- •Routes high-risk cases to analysts before any blocking action is taken.
•
Audit and compliance store
- •Persists prompts, retrieved context, model outputs, analyst decisions, and timestamps for review.

Implementation

1. Build a retriever over fraud policies and historical cases

Use VectorStoreIndex for policy docs and prior case notes. In retail banking, this gives the agent grounded context instead of letting it guess from the raw transaction alone.

from llama_index.core import VectorStoreIndex, Document
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

fraud_docs = [
    Document(text="Escalate if card-not-present transaction is followed by password reset within 10 minutes."),
    Document(text="High risk when beneficiary is new and transfer amount exceeds customer's 95th percentile."),
    Document(text="Prior case: account takeover involved device change + failed login burst + new payee."),
]

index = VectorStoreIndex.from_documents(fraud_docs)
retriever = index.as_retriever(similarity_top_k=2)

2. Define a structured risk analysis tool

The agent should not return vague prose. Make it emit a structured result so downstream systems can route the case consistently.

from pydantic import BaseModel, Field
from typing import List

class FraudAssessment(BaseModel):
    risk_level: str = Field(description="low, medium, or high")
    reasons: List[str]
    recommended_action: str

def assess_transaction(transaction_text: str) -> str:
    nodes = retriever.retrieve(transaction_text)
    context = "\n".join([n.node.get_content() for n in nodes])

    prompt = f"""
You are a retail banking fraud analyst.
Use only the evidence below.

Transaction:
{transaction_text}

Retrieved evidence:
{context}

Return:
- risk_level
- reasons
- recommended_action
"""
    response = Settings.llm.complete(prompt)
    return response.text

3. Wrap retrieval + reasoning in an `QueryEngineTool`

This is the pattern that turns your knowledge base into an agent tool. The agent can call it when it needs policy grounding or similar-case lookup.

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent

query_engine = index.as_query_engine(similarity_top_k=2)

fraud_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="fraud_policy_lookup",
        description="Lookup fraud policies and prior case patterns for retail banking transactions."
    ),
)

agent = ReActAgent.from_tools(
    tools=[fraud_tool],
    llm=Settings.llm,
    verbose=True,
)

4. Run an investigation on a suspicious transaction

Keep the input narrow and factual. Feed the agent transaction details plus any operational signals you already have from your fraud pipeline.

transaction = """
Customer ID: 482193
Channel: mobile app
Amount: $4,800
Type: ACH transfer
Beneficiary: new payee added 3 minutes ago
Signals: failed login burst (5), device fingerprint changed, password reset completed today
"""

result = agent.chat(
    f"Assess this transaction for fraud risk and explain the decision:\n{transaction}"
)

print(result)

That pattern gives you three things banks care about:

•A grounded answer from retrieved policy/case context
•A repeatable interface for analysts or orchestration code
•A transcript you can store for audit review

Production Considerations

•
Data residency
- •Keep documents, embeddings, and logs in-region if your banking policy requires it. If you operate across countries, do not centralize sensitive customer data in a single foreign vector store.
•
Auditability
- •Persist the full chain: input transaction payload, retrieved nodes, model prompt, model output, analyst override, and final disposition. Regulators will care about why the system flagged or cleared a customer event.
•
Guardrails
- •Never let the agent auto-block accounts on its own. Use it as decision support unless your policy explicitly allows automated action for low-value or clearly defined rules-based cases.
•
Monitoring
- •Track false positives by segment: card-present vs card-not-present, new customers vs established customers, domestic vs cross-border transfers. Fraud models drift fast when attacker behavior changes.

Common Pitfalls

•
Using ungrounded prompts
- •If you ask the model to “detect fraud” without retrieving policy or prior cases first, you get inconsistent answers. Always attach retrieval from approved banking content.
•
Letting free-form text drive actions
- •A natural language response is not enough for production routing. Convert outputs into structured fields like risk_level, reason_codes, and recommended_action.
•
Ignoring analyst feedback loops
- •Fraud teams refine labels after manual review. Feed those outcomes back into your document store or downstream classifier so the agent learns current attack patterns instead of stale ones.
•
Mixing jurisdictions without controls
- •Retail banking data often has residency constraints tied to country or product line. Separate indexes by region or enforce hard filters before retrieval so the agent never sees disallowed records.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit