How to Build a fraud detection Agent Using LangChain in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21

fraud-detectionlangchainpythonretail-banking

A fraud detection agent for retail banking triages suspicious activity, enriches it with transaction context, and decides whether to escalate, block, or request human review. It matters because fraud teams need fast, explainable decisions that reduce losses without killing legitimate customer experience.

Architecture

•
Transaction intake layer
- •Receives card payments, ACH transfers, login events, device fingerprints, and beneficiary changes.
- •Normalizes events into a single schema before the agent sees them.
•
Risk feature builder
- •Pulls recent velocity signals, customer profile data, merchant history, geo mismatch, and account tenure.
- •Produces compact context for the LLM and downstream rules.
•
LangChain decision agent
- •Uses ChatPromptTemplate, ChatOpenAI, and tool calling to reason over the event.
- •Returns a structured decision like approve, hold, escalate, or block.
•
Policy and rules layer
- •Enforces hard constraints outside the model: sanctions hits, KYC status, transaction limits, country restrictions.
- •Keeps the LLM from making decisions it should never own.
•
Case management sink
- •Writes decisions, rationale, and evidence to an audit store.
- •Feeds SIEM, case management, and analyst review queues.
•
Human review workflow
- •Handles ambiguous or high-value cases.
- •Lets analysts override model decisions and create labeled feedback for retraining.

Implementation

1) Define a structured decision contract

Do not let the model return free-form text. Use a Pydantic schema so every decision is parseable and auditable.

from typing import Literal, List
from pydantic import BaseModel, Field

class FraudDecision(BaseModel):
    action: Literal["approve", "hold", "escalate", "block"] = Field(...)
    risk_score: int = Field(ge=0, le=100)
    reasons: List[str]
    requires_human_review: bool

2) Build the LangChain agent chain

This example uses ChatOpenAI, ChatPromptTemplate, and structured output. The prompt includes banking-specific policy constraints and asks for concise rationale.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a fraud triage assistant for retail banking. "
     "Follow policy strictly. "
     "Never approve transactions that violate sanctions, KYC restrictions, or country blocks. "
     "Return only structured output."),
    ("human",
     """Transaction:
     Customer ID: {customer_id}
     Amount: {amount}
     Currency: {currency}
     Merchant: {merchant}
     Country: {country}
     Device change in last 24h: {device_change}
     Failed logins in last hour: {failed_logins}
     Velocity count in last 10 min: {velocity_count}
     Account age days: {account_age_days}
     Prior chargebacks: {prior_chargebacks}

     Decide whether this should be approved, held, escalated, or blocked.""")
])

fraud_chain = prompt | llm.with_structured_output(FraudDecision)

3) Add hard rules before the LLM

The model should never be the first line of defense. Put deterministic controls in front of it for compliance-heavy checks.

SANCTIONED_COUNTRIES = {"IR", "KP", "SY"}
MAX_AUTO_APPROVE_AMOUNT = 5000

def apply_hard_rules(txn: dict):
    if txn["country"] in SANCTIONED_COUNTRIES:
        return FraudDecision(
            action="block",
            risk_score=100,
            reasons=["Sanctions or restricted jurisdiction"],
            requires_human_review=True,
        )

    if txn["amount"] > MAX_AUTO_APPROVE_AMOUNT and txn["failed_logins"] > 3:
        return FraudDecision(
            action="escalate",
            risk_score=85,
            reasons=["High amount with suspicious authentication pattern"],
            requires_human_review=True,
        )

    return None

4) Run the agent and persist an audit trail

Store inputs, outputs, model version, prompt version, and timestamps. That is non-negotiable in retail banking.

from datetime import datetime
import json

def evaluate_transaction(txn: dict):
    hard_rule_decision = apply_hard_rules(txn)
    if hard_rule_decision:
        return hard_rule_decision.model_dump()

    decision = fraud_chain.invoke(txn)
    record = {
        "transaction_id": txn["transaction_id"],
        "decision": decision.model_dump(),
        "model": "gpt-4o-mini",
        "prompt_version": "fraud-triage-v1",
        "evaluated_at": datetime.utcnow().isoformat(),
        "input_snapshot": txn,
    }

    with open("fraud_audit_log.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")

    return record["decision"]

A typical production flow is:

•ingest event from Kafka or a webhook
•enrich with customer/account features from your internal services
•run deterministic policy checks
•call the LangChain decision chain only when needed
•persist the full trace to your audit store

Production Considerations

•
Data residency
- •Keep customer PII inside approved regions.
- •If you use hosted LLMs, verify region pinning and contractual controls.
- •Redact account numbers, card PANs, and national IDs before sending prompts.
•
Auditability
- •Log every input feature used by the agent.
- •Version prompts like application code.
- •Store model name, temperature, tool calls, and final action for regulator review.
•
Guardrails
- •Enforce sanctions screening and regulatory blocks outside the LLM.
- •Cap auto-block actions unless confidence thresholds are met.
- •Route borderline cases to analysts instead of letting the model decide alone.
•
Monitoring
- •Track false positives by segment: geography, merchant category code, channel type.
- •Watch drift in approval rates after model or prompt changes.
- •Alert on latency spikes because fraud triage often sits on critical payment paths.

Common Pitfalls

•
Letting the LLM make unrestricted decisions
- •Bad pattern: asking the model to “detect fraud” with no policy layer.
- •Fix it by putting hard rules first and constraining outputs with with_structured_output().
•
Sending raw sensitive data into prompts
- •Bad pattern: including full PANs, CVVs, or national IDs in prompt text.
- •Fix it by tokenizing or masking PII before calling LangChain tools or models.
•
Skipping human review for high-impact actions
- •Bad pattern: auto-blocking large transfers based only on model confidence.
- •Fix it by routing high-value or ambiguous cases into an analyst queue with full evidence attached.
•
Ignoring explainability requirements
- •Bad pattern: storing only the final action without reasons.
- •Fix it by persisting structured reasons plus input features so compliance can reconstruct why a case was escalated.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit