How to Build a fraud detection Agent Using LangChain in Python for fintech

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionlangchainpythonfintech

A fraud detection agent in fintech takes transaction data, customer context, and policy rules, then decides whether to approve, block, escalate, or request more verification. It matters because fraud losses compound fast, but so do false positives: every bad decision hits revenue, customer trust, and compliance.

Architecture

  • Transaction intake layer

    • Receives payment events, card-not-present attempts, account logins, or payout requests.
    • Normalizes fields like amount, merchant category, device fingerprint, IP geolocation, and velocity signals.
  • Risk context retriever

    • Pulls customer history from internal systems: prior disputes, account age, KYC status, recent failed attempts, and device reputation.
    • In LangChain terms, this is usually a tool-backed lookup function wrapped with @tool.
  • Policy and compliance rules engine

    • Encodes hard stops such as sanctioned countries, blocked BIN ranges, or high-risk jurisdictions.
    • Keeps deterministic controls separate from model judgment for auditability.
  • LLM reasoning layer

    • Converts structured inputs into a risk assessment and recommended action.
    • Uses ChatOpenAI plus a strict output schema via PydanticOutputParser so the result is machine-readable.
  • Decision orchestrator

    • Combines rule results and LLM output into one final action: approve, step_up, hold, or decline.
    • This should be explicit code, not “let the model decide everything.”
  • Audit logger

    • Stores prompt inputs, retrieved evidence, model output, final decision, and rule hits.
    • Required for incident review, model governance, and regulator questions.

Implementation

1) Define the risk schema and the prompt contract

Use a structured output model so the agent cannot return vague prose. In fraud workflows you want predictable JSON-like output every time.

from typing import Literal
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser

class FraudDecision(BaseModel):
    action: Literal["approve", "step_up", "hold", "decline"] = Field(
        description="Final recommended action"
    )
    risk_score: int = Field(ge=0, le=100)
    reasons: list[str]
    evidence_used: list[str]

parser = PydanticOutputParser(pydantic_object=FraudDecision)

prompt = ChatPromptTemplate.from_messages([
    ("system", 
     "You are a fraud analyst for a fintech platform. "
     "Use only the provided transaction data and evidence. "
     "Be conservative with high-risk signals."),
    ("human",
     "Transaction:\n{transaction}\n\nCustomer context:\n{context}\n\n"
     "Policy flags:\n{policy_flags}\n\n{format_instructions}")
]).partial(format_instructions=parser.get_format_instructions())

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

2) Add deterministic policy checks before the model runs

This is where you enforce hard compliance boundaries. If a transaction violates policy, do not ask the model to “reason it out.”

from langchain_core.tools import tool

BLOCKED_COUNTRIES = {"IR", "KP", "SY"}
HIGH_RISK_MCC = {"4829", "6012"}  # example only

@tool
def policy_check(transaction: dict) -> dict:
    flags = []
    if transaction.get("country") in BLOCKED_COUNTRIES:
        flags.append("blocked_country")
    if transaction.get("mcc") in HIGH_RISK_MCC:
        flags.append("high_risk_mcc")
    if transaction.get("amount", 0) > 5000:
        flags.append("high_value")
    return {"flags": flags}

3) Build the LangChain pipeline with retrieval + structured output

Here we combine internal context with the LLM. The retrieval step can be any internal API call; keep it as a tool so it is traceable.

from langchain_core.runnables import RunnableLambda

def get_customer_context(transaction: dict) -> str:
    # Replace with real DB/API lookup
    return (
        f"account_age_days={transaction.get('account_age_days', 0)}, "
        f"chargebacks_90d={transaction.get('chargebacks_90d', 0)}, "
        f"device_trust={transaction.get('device_trust', 'unknown')}, "
        f"recent_failed_logins={transaction.get('recent_failed_logins', 0)}"
    )

def build_inputs(transaction: dict) -> dict:
    policy = policy_check.invoke(transaction)
    context = get_customer_context(transaction)
    return {
        "transaction": transaction,
        "context": context,
        "policy_flags": policy["flags"],
    }

fraud_chain = (
    RunnableLambda(build_inputs)
    | prompt
    | llm
    | parser
)

sample_txn = {
    "amount": 7800,
    "country": "US",
    "mcc": "6012",
    "account_age_days": 12,
    "chargebacks_90d": 2,
    "device_trust": "low",
    "recent_failed_logins": 5,
}

result = fraud_chain.invoke(sample_txn)
print(result.model_dump())

4) Wrap the model output in an explicit decision layer

Never deploy raw LLM output directly. Use your own thresholding logic so you can tune behavior without changing prompts.

def finalize_decision(decision: FraudDecision) -> str:
    if decision.action == "decline":
        return "decline"
    if decision.risk_score >= 85:
        return "decline"
    if decision.risk_score >= 60:
        return "step_up"
    return decision.action

final_action = finalize_decision(result)
print(final_action)

Production Considerations

  • Keep sensitive data residency under control

    • Route PII through approved regions only.
    • If you use hosted LLMs, verify where prompts and logs are stored.
    • Mask PANs, bank account numbers, email addresses, and government IDs before sending anything to the model.
  • Log for auditability

    • Persist input features, policy flags, retrieved evidence, model version, prompt version, and final action.
    • This is non-negotiable for disputes, SAR/AML reviews where applicable, and internal model governance.
  • Use guardrails around actions

    • Let the agent recommend; let your application execute.
    • Require step-up verification for medium risk instead of auto-decline where possible to reduce false positives.
  • Monitor drift and alert rates

    • Track approval rate by segment, chargeback rate after approval, manual review override rate, and latency.
    • Fraud patterns move fast; retrain rules and reevaluate prompts on a fixed cadence.

Common Pitfalls

  • Letting the LLM make final compliance decisions

    • Bad pattern: “model says decline” with no deterministic checks.
    • Fix: run sanctions checks, jurisdiction rules,, velocity limits first; use LangChain for reasoning only after hard rules pass.
  • Sending raw regulated data into prompts

    • Bad pattern: dumping full customer profiles into chat messages.
    • Fix: redact sensitive fields and pass only the minimum features needed for scoring.
  • Using free-form text outputs

    • Bad pattern: parsing natural language like “this looks suspicious.”
    • Fix: enforce PydanticOutputParser or another structured schema so downstream systems can act safely.

A fraud detection agent works best when it behaves like a controlled analyst: deterministic on policy violations, structured on outputs, and fully auditable. That’s the difference between an experiment and something you can put in front of payments risk teams at a regulated fintech.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides