How to Build a fraud detection Agent Using LlamaIndex in Python for payments

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionllamaindexpythonpayments

A fraud detection agent for payments is not a rules engine with a chat UI bolted on. It is an orchestration layer that pulls in transaction context, compares it against policy and historical cases, and returns a decision with evidence your ops team can audit. For payments, that matters because every false positive blocks revenue, and every false negative becomes chargebacks, disputes, and compliance pain.

Architecture

  • Transaction intake

    • Receives payment events from your API gateway, Kafka topic, or webhook handler.
    • Normalizes fields like amount, merchant_id, country, device_id, card_bin, and ip_address.
  • Risk context store

    • Holds prior fraud cases, chargeback notes, merchant profiles, velocity rules, and internal policy docs.
    • Backed by a vector index for semantic retrieval and optionally a SQL store for structured lookups.
  • LlamaIndex retrieval layer

    • Uses VectorStoreIndex, RetrieverQueryEngine, and tools to fetch relevant policy snippets and historical incidents.
    • Grounds the agent in your internal controls instead of letting it guess.
  • Decision engine

    • Produces one of three actions: approve, review, or block.
    • Combines retrieved evidence with deterministic payment rules like amount thresholds or country blocks.
  • Audit trail

    • Persists the input transaction, retrieved sources, model response, and final action.
    • Required for disputes, compliance reviews, and model governance.
  • Monitoring and feedback loop

    • Tracks precision, recall, manual review rate, and chargeback lift.
    • Feeds confirmed fraud labels back into the knowledge base.

Implementation

1) Build the fraud knowledge base

Start with policy docs and historical fraud cases. In production, these come from approved internal sources only; do not dump raw cardholder data into embeddings.

from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

docs = [
    Document(text="""
    Fraud policy:
    - Block transactions over $5,000 from high-risk geographies unless merchant is whitelisted.
    - Review transactions with device mismatch plus first-time merchant usage.
    - Escalate repeated failed authorization attempts within 10 minutes.
    """, metadata={"source": "fraud_policy_v1"}),

    Document(text="""
    Case #1842:
    A card-not-present purchase was blocked after IP country differed from BIN country,
    amount was $2,300, and the device fingerprint had never been seen before.
    """, metadata={"source": "case_1842"}),

    Document(text="""
    Case #1901:
    Chargeback cluster at merchant 'M-8821' showed velocity spikes from multiple cards
    using the same shipping address and disposable email domains.
    """, metadata={"source": "case_1901"}),
]

splitter = SentenceSplitter(chunk_size=256, chunk_overlap=40)
nodes = splitter.get_nodes_from_documents(docs)

index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=3)

2) Query the agent with transaction context

Use the transaction as a structured prompt. The key pattern is to keep the agent grounded in retrieved evidence and force a compact decision schema.

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

def assess_transaction(txn: dict) -> str:
    prompt = f"""
You are a payments fraud analyst.

Transaction:
- amount: {txn["amount"]}
- currency: {txn["currency"]}
- merchant_id: {txn["merchant_id"]}
- card_bin_country: {txn["card_bin_country"]}
- ip_country: {txn["ip_country"]}
- device_seen_before: {txn["device_seen_before"]}
- failed_auth_attempts_10m: {txn["failed_auth_attempts_10m"]}

Return JSON with keys:
decision: approve|review|block
reason: short explanation
evidence: list of policy/case references
"""
    response = query_engine.query(prompt)
    return str(response)

txn = {
    "amount": 2300,
    "currency": "USD",
    "merchant_id": "M-8821",
    "card_bin_country": "US",
    "ip_country": "NG",
    "device_seen_before": False,
    "failed_auth_attempts_10m": 4,
}

print(assess_transaction(txn))

This works because VectorStoreIndex.as_query_engine() gives you retrieval-backed generation without building a full agent loop yet. For most payment workflows, that is enough if you pair it with deterministic rules outside the model.

3) Add deterministic payment guardrails before model judgment

Do not ask the model to rediscover obvious policy violations. Handle hard blocks in code first, then send borderline cases to LlamaIndex for contextual review.

def hard_rules(txn: dict) -> str | None:
    high_risk_countries = {"NG", "PK", "UA"}
    if txn["amount"] > 5000 and txn["ip_country"] in high_risk_countries:
        return "block"
    if txn["failed_auth_attempts_10m"] >= 5:
        return "review"
    if txn["card_bin_country"] != txn["ip_country"] and not txn["device_seen_before"]:
        return None
    return None

def decide(txn: dict) -> dict:
    rule_decision = hard_rules(txn)
    if rule_decision == "block":
        return {"decision": "block", "reason": "Hard rule triggered", "evidence": ["policy_threshold"]}

    llm_result = assess_transaction(txn)
    return {"decision": "review", "reason": llm_result[:400], "evidence": ["retrieval_based_review"]}

print(decide(txn))

That split matters in payments. You want deterministic controls for compliance-sensitive thresholds and an LLM only where judgment adds value.

4) Persist audit records for compliance

Every decision needs traceability. Store inputs, retrieved sources, model output, timestamp, and reviewer outcome in your own system of record.

import json
from datetime import datetime

def write_audit_record(txn: dict, result: dict):
    record = {
        "timestamp": datetime.utcnow().isoformat(),
        "transaction": txn,
        "decision": result["decision"],
        "reason": result["reason"],
        "evidence": result.get("evidence", []),
        "model": "gpt-4o-mini",
        "system": "llamaindex_fraud_agent",
        }
    
    with open("fraud_audit_log.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")

result = decide(txn)
write_audit_record(txn, result)

Production Considerations

  • Data residency

    • Keep transaction data and embeddings in-region when regulations require it.
    • If you process EU payments, make sure your vector store and LLM endpoint satisfy residency constraints.
  • Compliance logging

    • Log decisions with immutable timestamps and source references.
    • Retain enough evidence to explain why a payment was blocked or sent to review under PCI DSS and internal audit requirements.
  • Monitoring

    • Track approval rate by merchant segment, manual review load, false positives, false negatives, and chargeback rate.
    • Alert on sudden shifts in IP-country mismatches or spikes in one merchant’s risk score.
  • Guardrails

    • Redact PANs, CVVs, full names where possible before sending text to an LLM.
    • Enforce allowlisted outputs like approve, review, or block; never accept free-form decisions in production routing.

Common Pitfalls

  1. Using the LLM as the primary rules engine

    • This creates inconsistent decisions on obvious cases.
    • Fix it by handling hard policy thresholds in code first and using LlamaIndex for contextual analysis only.
  2. Embedding sensitive payment data without controls

    • Raw PANs or full customer identifiers should not be indexed casually.
    • Fix it by redacting sensitive fields before ingestion and storing only what is necessary for fraud analysis.
  3. No audit trail for disputes

    • If you cannot explain why a payment was blocked, operations will override the system fast.
    • Fix it by persisting retrieved evidence, final decision text, model version, and reviewer outcome for every transaction.

A fraud detection agent built this way is practical: deterministic where it must be, retrieval-grounded where it should be nuanced. That is the pattern I would ship for payments before adding more complexity like multi-agent escalation or live case enrichment.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides