How to Build a fraud detection Agent Using LangChain in Python for payments
A fraud detection agent for payments scores incoming transactions, explains why a payment looks risky, and decides whether to approve, step-up verify, or route to manual review. For payments teams, that matters because false positives kill conversion, while false negatives create chargebacks, loss exposure, and compliance headaches.
Architecture
- •
Transaction intake layer
- •Accepts payment events from your API, queue, or webhook.
- •Normalizes fields like amount, currency, merchant category, device fingerprint, IP country, and customer history.
- •
Feature enrichment layer
- •Pulls risk signals from internal systems.
- •Examples: account age, prior disputes, velocity counts, BIN country mismatch, shipping mismatch.
- •
LLM reasoning layer
- •Uses LangChain to turn structured signals into a consistent fraud assessment.
- •Produces a risk score, rationale, and recommended action in JSON.
- •
Policy and guardrail layer
- •Enforces hard rules before the LLM makes a recommendation.
- •Examples: blocked geographies, sanctions hits, amount thresholds, high-risk MCCs.
- •
Decision orchestration layer
- •Converts the model output into
approve,step_up, ormanual_review. - •Logs every decision with inputs for auditability.
- •Converts the model output into
- •
Audit and observability layer
- •Stores prompts, outputs, model version, policy version, and final decision.
- •Needed for dispute handling, internal controls, and regulator review.
Implementation
1) Define the transaction schema and the prompt contract
Keep the model input structured. For payments work, free-form text is a bad idea because you need stable outputs for downstream routing and audits.
from typing import Literal
from pydantic import BaseModel, Field
class PaymentTransaction(BaseModel):
transaction_id: str
amount: float
currency: str
merchant_id: str
merchant_category: str
customer_id: str
ip_country: str
billing_country: str
shipping_country: str | None = None
device_risk_score: float = Field(ge=0.0, le=1.0)
velocity_10m: int = Field(ge=0)
prior_chargebacks_90d: int = Field(ge=0)
class FraudDecision(BaseModel):
risk_score: float = Field(ge=0.0, le=1.0)
decision: Literal["approve", "step_up", "manual_review", "decline"]
reasons: list[str]
2) Build the LangChain chain with structured output
Use ChatOpenAI plus with_structured_output() so the model returns a validated object instead of loose text. That is the right pattern when your output drives money movement.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system",
"You are a payment fraud analyst. "
"Assess transaction risk using the provided signals. "
"Be strict on anomalies involving geography mismatch, velocity spikes,"
" chargeback history, and device risk. "
"Return only the requested structured output."),
("human",
"Transaction:\n{transaction}\n\n"
"Rules:\n"
"- Risk score must be between 0 and 1.\n"
"- Use 'decline' only for clearly high-risk cases.\n"
"- Use 'step_up' when additional verification can reduce risk.\n")
])
fraud_chain = prompt | llm.with_structured_output(FraudDecision)
3) Add deterministic policy checks before the model
This is where payments teams avoid expensive mistakes. Hard rules should short-circuit obvious abuse before you spend tokens or let an LLM override policy.
HIGH_RISK_COUNTRIES = {"NG", "RU", "KP"}
BLOCKED_MCCS = {"4829", "5967"} # example high-risk categories
def apply_policy(txn: PaymentTransaction) -> tuple[bool, list[str]]:
reasons = []
if txn.ip_country in HIGH_RISK_COUNTRIES:
reasons.append(f"High-risk IP country: {txn.ip_country}")
if txn.velocity_10m >= 5:
reasons.append(f"Velocity spike detected: {txn.velocity_10m} txns/10m")
if txn.prior_chargebacks_90d >= 3:
reasons.append(f"Chargeback history too high: {txn.prior_chargebacks_90d}")
if txn.merchant_category in BLOCKED_MCCS:
reasons.append(f"Blocked MCC: {txn.merchant_category}")
return (len(reasons) == 0), reasons
def assess_transaction(txn_data: dict) -> dict:
txn = PaymentTransaction(**txn_data)
allowed_by_policy, policy_reasons = apply_policy(txn)
if not allowed_by_policy:
return {
"transaction_id": txn.transaction_id,
"decision": "decline",
"risk_score": 1.0,
"reasons": policy_reasons,
"source": "policy"
}
result = fraud_chain.invoke({"transaction": txn.model_dump_json()})
return {
"transaction_id": txn.transaction_id,
"decision": result.decision,
"risk_score": result.risk_score,
"reasons": result.reasons,
"source": "llm"
}
sample_txn = {
"transaction_id": "txn_123",
"amount": 249.99,
"currency": "USD",
"merchant_id": "m_456",
"merchant_category": "5311",
"customer_id": "c_789",
"ip_country": "US",
"billing_country": "US",
"shipping_country": None,
"device_risk_score": 0.82,
"velocity_10m": 6,
"prior_chargebacks_90d": 1
}
print(assess_transaction(sample_txn))
4) Add traceability for audit and incident review
For payments you need to answer three questions later: what did the model see, why did it decide that way, and which version made the call? LangChain callbacks are useful here because they let you capture execution metadata without tangling business logic.
from langchain_core.callbacks.base import BaseCallbackHandler
class AuditLogger(BaseCallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
print("CHAIN_START", serialized.get("name"), inputs)
def on_chain_end(self, outputs, **kwargs):
print("CHAIN_END", outputs)
audited_chain = prompt | llm.with_structured_output(FraudDecision)
result = audited_chain.invoke(
{"transaction": PaymentTransaction(**sample_txn).model_dump_json()},
)
print(result)
Production Considerations
- •
Deploy policy first, model second
- •Keep sanctions screening, blocked geographies, amount caps, and velocity rules outside the LLM.
- •The model should explain borderline cases; it should not be your first line of defense.
- •
Log everything needed for audit
- •Store transaction snapshot hashes, prompt version, model version, output JSON, final action.
- •This helps with card network disputes, internal audit requests, and regulator reviews.
- •
Respect data residency and PII boundaries
- •Do not send raw PANs or sensitive personal data to the model.
- •Tokenize or redact fields before invoking LangChain.
- •If you operate across regions like EU/UK/US/APAC,keep inference in-region where required.
- •
Monitor drift by segment
- •Track approval rate, step-up rate، false positives by merchant category، country pair، device type، and amount band.
- •Fraud patterns shift fast; segment-level monitoring catches problems earlier than global averages.
Common Pitfalls
- •
Letting the LLM make hard compliance decisions
- •Bad pattern: asking the model to decide whether a transaction violates sanctions or KYC rules.
- •Fix it by encoding compliance as deterministic policy checks before any LLM call.
- •
Passing raw payment data into prompts
- •Bad pattern: including PANs، CVVs، full addresses، or unrestricted PII in prompt text.
- •Fix it with redaction/tokenization and strict field selection before
invoke().
- •
Using unstructured text outputs
- •Bad pattern: parsing free-form “high risk because…” strings with regex.
- •Fix it by using
with_structured_output()with a Pydantic schema so your downstream code stays stable.
- •
Ignoring versioning
- •Bad pattern: changing prompts without tracking performance shifts.
- •Fix it by versioning prompts، policies، schemas، and models together so every decision is reproducible.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit