How to Build a fraud detection Agent Using LangChain in Python for insurance

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionlangchainpythoninsurance

A fraud detection agent for insurance takes a claim, policy, and supporting documents, then decides whether the case needs straight-through processing, manual review, or escalation to SIU. It matters because fraud leaks margin fast, but so does over-blocking legitimate claims; the agent has to be useful, auditable, and conservative enough for regulated operations.

Architecture

  • Claim intake layer

    • Accepts structured claim data: claimant details, policy metadata, loss description, payout amount, prior claims.
    • Normalizes inputs before they hit the LLM.
  • Evidence retrieval layer

    • Pulls policy wording, claims history, adjuster notes, and fraud indicators from a vector store or document system.
    • Keeps the model grounded in internal facts instead of guessing.
  • Fraud reasoning chain

    • Uses a LangChain ChatPromptTemplate plus an LLM to score risk and explain why.
    • Produces a structured output with fraud_risk, flags, and recommended_action.
  • Decision policy

    • Applies deterministic rules after the model output.
    • Example: auto-escalate if risk is high and there are multiple red flags.
  • Audit logging

    • Stores prompt version, retrieved evidence IDs, model output, and final decision.
    • Required for compliance review and dispute handling.
  • Human review handoff

    • Routes suspicious cases to SIU or an adjuster queue.
    • Keeps the agent advisory, not autonomous, for high-impact decisions.

Implementation

1) Define the schema and prompt

Use structured outputs. In insurance workflows, free-form text is a liability because it is hard to audit and hard to validate.

from typing import List, Literal
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate

class FraudAssessment(BaseModel):
    fraud_risk: Literal["low", "medium", "high"] = Field(description="Fraud risk level")
    flags: List[str] = Field(description="Specific fraud indicators found")
    recommended_action: Literal["approve", "review", "escalate"] = Field(description="Next step")
    rationale: str = Field(description="Short explanation grounded in evidence")

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are an insurance fraud triage assistant. "
     "Use only provided claim data and retrieved evidence. "
     "Do not invent facts. Return a structured assessment."),
    ("human",
     "Claim JSON:\n{claim_json}\n\n"
     "Retrieved evidence:\n{evidence}\n\n"
     "Assess fraud risk.")
])

2) Build the LangChain chain with structured output

This pattern uses ChatOpenAI, ChatPromptTemplate, and .with_structured_output() to force valid JSON-like results into your Pydantic schema.

import os
import json
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=os.environ["OPENAI_API_KEY"],
)

fraud_chain = prompt | llm.with_structured_output(FraudAssessment)

claim = {
    "claim_id": "CLM-10482",
    "policy_type": "auto",
    "loss_type": "collision",
    "claim_amount": 18750,
    "days_since_policy_start": 12,
    "prior_claims_12m": 3,
    "garage_location_matches_policy": False,
}

evidence = [
    "Policy inception date is 12 days before loss date.",
    "Three prior claims were filed in the last 12 months.",
    "Vehicle location at inspection differs from declared garaging address."
]

result = fraud_chain.invoke({
    "claim_json": json.dumps(claim),
    "evidence": "\n".join(evidence),
})

print(result.model_dump())

3) Add deterministic policy controls

The model should not make the final call alone. In insurance you want explicit thresholds that are easy to explain to compliance teams.

def decide_next_step(assessment: FraudAssessment) -> str:
    if assessment.fraud_risk == "high" and len(assessment.flags) >= 2:
        return "escalate"
    if assessment.fraud_risk == "medium":
        return "review"
    return assessment.recommended_action

final_action = decide_next_step(result)
print({"assessment": result.model_dump(), "final_action": final_action})

4) Wrap it as an auditable service call

Log what went in, what evidence was used, and what came out. If you need traceability across chains later, LangChain’s callback system can be added without changing core logic.

from datetime import datetime

def process_claim(claim_payload: dict, evidence_docs: list[str]) -> dict:
    assessment = fraud_chain.invoke({
        "claim_json": json.dumps(claim_payload),
        "evidence": "\n".join(evidence_docs),
    })

    action = decide_next_step(assessment)

    audit_record = {
        "timestamp": datetime.utcnow().isoformat(),
        "claim_id": claim_payload["claim_id"],
        "model": llm.model_name,
        "assessment": assessment.model_dump(),
        "final_action": action,
        # store prompt version + evidence IDs in your DB here
    }
    return audit_record

Production Considerations

  • Compliance and explainability

    • Keep every decision tied to retrieved evidence and stored prompt versions.
    • For adverse actions or escalations, retain rationale text that legal/compliance can review.
  • Data residency

    • Claims data often includes PII and sensitive financial information.
    • Route requests through region-bound infrastructure and verify your LLM provider’s residency guarantees before sending production traffic.
  • Monitoring

    • Track false positives by line of business, geography, adjuster team, and claim type.
    • Monitor drift in fraud-risk distribution; a sudden spike usually means upstream data changed or prompts regressed.
  • Guardrails

    • Redact SSNs, bank details, medical notes, and other sensitive fields before prompting.
    • Enforce max-risk thresholds where the agent can only recommend review or escalation, never auto-deny.

Common Pitfalls

  • Letting the LLM make final determinations

    • Don’t do this for insurance claims.
    • Use the model for triage and explanation; keep approval/denial logic in deterministic code or human workflow.
  • Feeding raw documents without retrieval discipline

    • Dumping entire claim files into context increases cost and noise.
    • Retrieve only relevant policy clauses, prior claim summaries, inspection notes, and known fraud signals.
  • Skipping audit artifacts

    • If you can’t reconstruct why a claim was flagged six months later, you have a governance problem.
    • Persist input hashes, evidence IDs, prompt versioning, model name/version, and final routing decision.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides