How to Build a fraud detection Agent Using LlamaIndex in Python for fintech

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionllamaindexpythonfintech

A fraud detection agent in fintech takes transaction data, customer context, and policy rules, then turns them into a risk decision with an explanation you can audit. It matters because fraud ops teams need more than a binary score: they need traceable reasoning, consistent escalation, and controls that satisfy compliance.

Architecture

  • Transaction ingestion layer
    • Pulls card payments, ACH transfers, account changes, login events, and device fingerprints from your event stream or API.
  • Risk context store
    • Holds customer profile data, historical behavior, merchant metadata, chargeback history, sanctions hits, and known fraud patterns.
  • LlamaIndex retrieval layer
    • Uses VectorStoreIndex and QueryEngine to fetch relevant policy docs, prior cases, and analyst playbooks.
  • Decision engine
    • Combines deterministic rules with LLM-backed reasoning for “review / block / allow” outcomes.
  • Audit logging layer
    • Stores inputs, retrieved evidence, model output, and final decision for compliance review.
  • Human escalation workflow
    • Routes ambiguous or high-value cases to analysts with the full evidence trail.

Implementation

1) Install the right packages

Use LlamaIndex plus a vector store backend. For production fintech systems, keep the document store separate from your transaction database.

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai chromadb

Set your model credentials through environment variables. Keep secrets out of code and out of notebooks.

2) Build the knowledge base for policies and case history

The agent should not guess what counts as fraud. Give it explicit policy docs: escalation thresholds, prohibited patterns, KYC rules, chargeback handling steps, and regional compliance constraints.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

# Configure global models
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Load internal documents: fraud policy PDFs converted to text files
documents = SimpleDirectoryReader("./fraud_policy_docs").load_data()

# Build the index used by the agent for retrieval
index = VectorStoreIndex.from_documents(documents)

# Query engine for evidence lookup
query_engine = index.as_query_engine(similarity_top_k=3)

This pattern gives you grounded answers from internal material instead of free-form model output. In fintech, that matters because your decision logic must map back to documented policy.

3) Define a fraud decision function with structured input

Keep the agent’s input small and explicit. You want a stable schema that can be logged and replayed during audits.

from dataclasses import dataclass
from typing import List

@dataclass
class TransactionCase:
    transaction_id: str
    customer_id: str
    amount: float
    currency: str
    country: str
    merchant_category: str
    velocity_5m: int
    device_risk_score: float
    past_chargebacks_90d: int

def build_fraud_prompt(case: TransactionCase) -> str:
    return f"""
You are a fraud analyst assistant for a fintech company.

Decision options:
- allow
- review
- block

Rules:
- Prefer conservative decisions when policy evidence is ambiguous.
- Always cite the policy snippets used.
- Never recommend actions that violate compliance or data residency rules.

Case:
transaction_id={case.transaction_id}
customer_id={case.customer_id}
amount={case.amount} {case.currency}
country={case.country}
merchant_category={case.merchant_category}
velocity_5m={case.velocity_5m}
device_risk_score={case.device_risk_score}
past_chargebacks_90d={case.past_chargebacks_90d}

Return JSON with keys:
decision, confidence, rationale, evidence_quotes
""".strip()

This is where most teams go wrong: they send raw logs into the model. Don’t do that. Normalize the case first so the prompt stays predictable.

4) Retrieve policy evidence and generate a decision

Use LlamaIndex retrieval to ground the response in actual policy text. Then log both the retrieved context and the final output.

import json

def assess_case(case: TransactionCase):
    # Retrieve relevant policy context
    evidence_response = query_engine.query(
        f"Find policy guidance for amount thresholds, velocity spikes, device risk,
        chargeback patterns, and when to escalate or block."
    )

    prompt = build_fraud_prompt(case) + "\n\nPolicy evidence:\n" + str(evidence_response)

    # LLM call through LlamaIndex Settings.llm
    response = Settings.llm.complete(prompt)

    result_text = response.text if hasattr(response, "text") else str(response)

    audit_record = {
        "transaction_id": case.transaction_id,
        "customer_id": case.customer_id,
        "retrieved_evidence": str(evidence_response),
        "model_output": result_text,
    }

    return audit_record

case = TransactionCase(
    transaction_id="txn_10091",
    customer_id="cus_7781",
    amount=4200.00,
    currency="USD",
    country="NG",
    merchant_category="digital_goods",
    velocity_5m=7,
    device_risk_score=0.92,
    past_chargebacks_90d=3,
)

record = assess_case(case)
print(json.dumps(record, indent=2))

In production you would parse the JSON response strictly before acting on it. If parsing fails or confidence is low, route to human review instead of auto-blocking.

Production Considerations

  • Keep data residency explicit
    • If your fintech operates across regions, make sure transaction data stays in approved jurisdictions. Store indexes per region if your regulatory boundary requires it.
  • Log everything needed for audit
    • Persist the exact prompt inputs, retrieved snippets, model version, timestamp, and final action. Fraud decisions without traceability are useless during disputes.
  • Add deterministic guardrails
    • Use hard rules for sanctioned countries, known compromised cards, impossible travel signals, or repeated failed authentications. The agent should explain decisions; it should not override mandatory blocks.
  • Monitor drift and false positives
    • Track approval rate shifts by merchant category, geography, device type, and customer segment. Fraud patterns change quickly; your retrieval corpus and thresholds need regular refreshes.

Common Pitfalls

  1. Using only an LLM without hard rules

    • Fraud systems need deterministic controls for regulatory and operational constraints. Fix this by running rule checks before the agent gets a vote.
  2. Retrieving stale policy documents

    • If analysts update escalation thresholds but your index still contains old docs, you will get bad decisions fast. Rebuild or incrementally refresh the index whenever policies change.
  3. Skipping structured outputs

    • Free-form text is hard to validate in downstream systems. Force JSON output with fixed keys like decision, confidence, rationale, and evidence_quotes, then reject malformed responses before execution.
  4. Mixing sensitive data into prompts unnecessarily

    • Don’t dump full PANs, bank account numbers, or raw PII into the model context unless you have a documented reason and controls in place. Mask what you can before retrieval and generation.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides