How to Build a fraud detection Agent Using LangChain in Python for banking

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionlangchainpythonbanking

A fraud detection agent in banking ingests transaction data, enriches it with customer and device context, scores risk, and decides whether to allow, step-up verify, or escalate to an analyst. The point is not just catching fraud; it’s reducing false positives so you do not block legitimate payments, trigger avoidable support calls, or create compliance headaches.

Architecture

  • Transaction intake layer

    • Receives payment events from your core banking system, card processor, or event bus.
    • Normalizes fields like amount, merchant category, IP address, device fingerprint, geo-location, and account age.
  • Context enrichment layer

    • Pulls customer profile data, historical velocity metrics, prior chargebacks, KYC tier, and recent login behavior.
    • This is where you add the signals that make fraud detection useful instead of noisy.
  • LangChain decision agent

    • Uses ChatPromptTemplate, RunnableLambda, and a chat model to reason over the transaction context.
    • Produces a structured decision: approve, step_up, or block, plus a short rationale for audit.
  • Policy and guardrail layer

    • Enforces hard rules outside the LLM: sanctions hits, impossible travel, blacklisted merchants, amount thresholds.
    • Keeps the model from making unsupported decisions on regulated actions.
  • Audit and case management layer

    • Stores inputs, outputs, model version, prompt version, and final action.
    • Required for internal review, dispute handling, and regulator-facing traceability.

Implementation

1. Install dependencies and define the transaction schema

Use LangChain’s current split packages. Keep the schema explicit so the agent does not rely on loose JSON blobs.

pip install langchain langchain-openai pydantic
from typing import Literal
from pydantic import BaseModel, Field

class Transaction(BaseModel):
    transaction_id: str
    account_id: str
    amount: float
    currency: str = "USD"
    merchant_category: str
    country: str
    ip_risk_score: int = Field(ge=0, le=100)
    device_trust_score: int = Field(ge=0, le=100)
    velocity_1h: int = Field(ge=0)
    chargeback_count_90d: int = Field(ge=0)

class FraudDecision(BaseModel):
    action: Literal["approve", "step_up", "block"]
    risk_score: int = Field(ge=0, le=100)
    rationale: str

2. Build hard rules before the model runs

In banking, deterministic controls come first. If a transaction trips a policy rule, do not ask an LLM to “think harder.”

from langchain_core.runnables import RunnableLambda

def policy_gate(txn: Transaction) -> dict:
    if txn.ip_risk_score >= 95:
        return {"action": "block", "risk_score": 100,
                "rationale": "IP risk score exceeds hard block threshold."}
    if txn.chargeback_count_90d >= 5:
        return {"action": "step_up", "risk_score": 85,
                "rationale": "High recent chargeback count requires step-up verification."}
    return {"action": "review"}

policy_chain = RunnableLambda(policy_gate)

3. Add a LangChain reasoning chain for cases that need judgment

This pattern uses ChatPromptTemplate plus structured output. The model only handles borderline cases after policy checks pass.

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=os.environ["OPENAI_API_KEY"],
)

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a banking fraud analyst. "
     "Return only a concise fraud decision based on the provided transaction facts. "
     "Consider false positives carefully. "
     "Never invent missing data."),
    ("human",
     "Transaction:\n{transaction}\n\n"
     "Customer context:\n{context}\n\n"
     "Decide action using approve, step_up, or block.")
])

fraud_chain = prompt | llm.with_structured_output(FraudDecision)

def enrich_context(txn: Transaction) -> dict:
    # Replace this with real feature retrieval from your warehouse or feature store.
    return {
        "avg_amount_30d": 120.50,
        "login_country_mismatch": True,
        "recent_failed_logins": 3,
        "device_seen_before": False,
        "kyc_tier": "standard",
    }

def run_fraud_agent(txn_dict: dict) -> dict:
    txn = Transaction(**txn_dict)
    gate_result = policy_chain.invoke(txn)

    if gate_result["action"] != "review":
        return gate_result

    context = enrich_context(txn)
    decision = fraud_chain.invoke({
        "transaction": txn.model_dump(),
        "context": context,
    })
    return decision.model_dump()

sample = {
    "transaction_id": "tx_123",
    "account_id": "acc_456",
    "amount": 9800.00,
    "currency": "USD",
    "merchant_category": "electronics",
    "country": "US",
    "ip_risk_score": 42,
    "device_trust_score": 61,
    "velocity_1h": 7,
    "chargeback_count_90d": 1,
}

print(run_fraud_agent(sample))

4. Wrap it for observability and case review

You want every decision stored with enough detail to reconstruct why it happened. In practice that means logging the input features, policy outcome, model output, and final action.

def persist_audit_record(result: dict) -> None:
    # Write to your audit store / SIEM / case management system.
    print({"audit_record": result})

result = run_fraud_agent(sample)
persist_audit_record(result)

Production Considerations

  • Keep regulated decisions deterministic where possible

    • Sanctions screening, blacklist checks, and threshold-based blocks should happen outside the LLM.
    • Use the model for ranking ambiguity and explaining borderline cases.
  • Log everything needed for audit

    • Store prompt version, model name, feature snapshot, output schema version, and final action.
    • Banking teams will ask why a payment was blocked three months later.
  • Control data residency

    • If customer data must stay in-region, deploy models in your approved cloud region or use an internal hosted model endpoint.
    • Do not send raw PII to third-party APIs unless your legal and security teams have signed off.
  • Monitor drift and false positive rates

    • Track approval rate by segment, manual review override rate, chargeback outcomes, and latency.
    • Fraud patterns change quickly; static prompts do not fix stale features.

Common Pitfalls

  1. Letting the LLM make final compliance decisions

    • Avoid this by placing hard policy gates before any model call.
    • The agent should recommend; policy engines should enforce.
  2. Feeding raw PII into prompts

    • Strip unnecessary identifiers like full card numbers or national IDs.
    • Pass only the minimum features required for scoring and explanation.
  3. Skipping structured outputs

    • Free-form text is bad for downstream automation and audit.
    • Use with_structured_output(FraudDecision) so your system gets predictable fields every time.
  4. Ignoring latency budgets

    • Fraud checks sit on the critical path of authorization.
    • Keep retrieval fast, cache stable features locally where allowed by policy, and fail closed only when your business rules require it.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides