How to Build a fraud detection Agent Using LangGraph in Python for lending

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionlanggraphpythonlending

A fraud detection agent for lending screens an application, scores risk signals, and decides whether to approve, reject, or route the case to manual review. In lending, that matters because fraud is not just a loss problem; it affects compliance, underwriting accuracy, charge-offs, and your ability to explain decisions to auditors and regulators.

Architecture

  • Input normalizer

    • Takes raw application data from LOS/CRM/KYC systems.
    • Standardizes fields like name, address, employer, income, device ID, and IP metadata.
  • Risk signal extractor

    • Pulls structured checks from rules or external services.
    • Examples: identity mismatch, velocity spikes, synthetic identity indicators, bureau inconsistencies.
  • LLM reasoning node

    • Summarizes evidence into a short fraud rationale.
    • Should never be the only decision-maker; use it to explain and triage.
  • Decision router

    • Converts signals into one of three actions:
      • approve
      • reject
      • manual_review
  • Audit logger

    • Persists every input signal, intermediate decision, and final outcome.
    • Needed for model governance and lender audit trails.
  • Policy guardrail layer

    • Enforces lending-specific constraints.
    • Example: if residency rules require EU-only processing, block non-compliant routes before any external API call.

Implementation

1) Define the state and decision schema

Use a typed state so each node in the graph has a clear contract. For lending workflows, keep both the raw applicant data and derived fraud signals in state so you can audit the full path later.

from typing import TypedDict, Literal, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

Decision = Literal["approve", "reject", "manual_review"]

class FraudState(TypedDict):
    applicant_id: str
    applicant_data: dict
    risk_signals: dict
    fraud_rationale: str
    decision: Decision
    audit_log: list[str]

2) Build deterministic checks first

Do not start with the LLM. In lending fraud detection, deterministic rules catch most obvious cases and are easier to defend in an audit.

def normalize_applicant(state: FraudState) -> FraudState:
    data = state["applicant_data"]
    normalized = {
        "full_name": data["full_name"].strip().lower(),
        "email_domain": data["email"].split("@")[-1].lower(),
        "income": float(data["income"]),
        "country": data["country"].upper(),
        "ip_country": data.get("ip_country", "").upper(),
    }
    return {
        **state,
        "applicant_data": normalized,
        "audit_log": state.get("audit_log", []) + ["normalized applicant fields"],
    }

def extract_risk_signals(state: FraudState) -> FraudState:
    data = state["applicant_data"]
    signals = {
        "country_mismatch": data["country"] != data["ip_country"],
        "free_email": data["email_domain"] in {"gmail.com", "yahoo.com", "outlook.com"},
        "high_income_low_signal": data["income"] > 200000,
    }
    return {
        **state,
        "risk_signals": signals,
        "audit_log": state.get("audit_log", []) + [f"signals={signals}"],
    }

3) Add an LLM-based explanation node and route by policy

Use the LLM to summarize evidence and produce a concise rationale. Then route using explicit policy logic so the graph remains deterministic where it matters.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def explain_fraud_risk(state: FraudState) -> FraudState:
    prompt = f"""
You are assisting with lending fraud triage.
Applicant ID: {state['applicant_id']}
Risk signals: {state['risk_signals']}
Return a short rationale focused on fraud indicators only.
"""
    response = llm.invoke(prompt)
    return {
        **state,
        "fraud_rationale": response.content.strip(),
        "audit_log": state.get("audit_log", []) + ["generated rationale"],
    }

def decide(state: FraudState) -> FraudState:
    s = state["risk_signals"]
    if s["country_mismatch"] and s["free_email"]:
        decision = "manual_review"
    elif s["country_mismatch"]:
        decision = "manual_review"
    elif s["high_income_low_signal"]:
        decision = "manual_review"
    else:
        decision = "approve"

    return {
        **state,
        "decision": decision,
        "audit_log": state.get("audit_log", []) + [f"decision={decision}"],
    }

4) Compile the LangGraph workflow

This is the actual LangGraph pattern: define nodes with add_node, connect them with add_edge, set conditional routing when needed, then compile.

def build_graph():
    graph = StateGraph(FraudState)

    graph.add_node("normalize_applicant", normalize_applicant)
    graph.add_node("extract_risk_signals", extract_risk_signals)
    graph.add_node("explain_fraud_risk", explain_fraud_risk)
    graph.add_node("decide", decide)

    graph.add_edge(START, "normalize_applicant")
    graph.add_edge("normalize_applicant", "extract_risk_signals")
    
    def route_after_signals(state: FraudState):
      # In production you can short-circuit hard fails here.
      return "explain_fraud_risk"

    graph.add_conditional_edges(
        "extract_risk_signals",
        route_after_signals,
        {"explain_fraud_risk": "explain_fraud_risk"}
    )

    graph.add_edge("explain_fraud_risk", "decide")
    graph.add_edge("decide", END)

    return graph.compile()

app = build_graph()

result = app.invoke({
    "applicant_id": "LN-100045",
    "applicant_data": {
        "full_name": "Jane Doe",
        "email": "[email protected]",
        "income": 180000,
        "country": "GB",
        "ip_country": "GB",
    },
    "risk_signals": {},
    "fraud_rationale": "",
"decision": "",
"audit_log": [],
})

print(result["decision"])
print(result["fraud_rationale"])

Production Considerations

  • Keep hard decisions deterministic

    • Use rules for obvious fraud patterns and reserve the LLM for explanation or triage.
    • This is easier to validate under model risk management.
  • Log everything needed for audit

    • Persist input payload hashes, extracted signals, node outputs, final decision, model version, and prompt version.
    • Lending teams will need this for adverse action reviews and internal audits.
  • Respect data residency

    • If applications include PII or credit-related data across regions, keep inference in-region.
    • Do not send sensitive lending data to third-party APIs without legal approval and DPA coverage.
  • Monitor drift by segment

    • Track false positives by product type: personal loans vs SME lending behave differently.
    • Watch for proxy bias in features like ZIP code, device geography, or email domain.

Common Pitfalls

  • Letting the LLM make the final call

    • Bad pattern: “model says fraud.”
    • Fix: use explicit rule-based routing for approve/reject/manual review and let the model justify or summarize.
  • Skipping auditability

    • If you cannot reconstruct why an application was routed to review, you will have problems with compliance.
    • Fix: store node-level outputs in a durable log with timestamps and versioned prompts.
  • Using weak normalization

    • Small inconsistencies in names, country codes, or income formats create noisy downstream results.
    • Fix: normalize before scoring and validate inputs at the edge of the system.
  • Ignoring jurisdiction-specific constraints

    • A lender operating across regions may have different retention rules or prohibited attributes.
    • Fix: add policy checks early in the graph so restricted cases never reach external calls or unsupported models.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides