How to Build a fraud detection Agent Using LangChain in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionlangchainpythonhealthcare

A fraud detection agent for healthcare inspects claims, prior authorizations, referrals, and billing notes to flag patterns that look abnormal, inconsistent, or outright abusive. It matters because healthcare fraud directly increases cost, delays care, and creates compliance exposure for providers and payers.

Architecture

  • Ingestion layer
    • Pulls claim records, provider metadata, member history, and policy rules from internal systems.
  • Normalization layer
    • Converts messy claim payloads into a consistent schema the agent can reason over.
  • Risk scoring agent
    • Uses LangChain to classify each case as low, medium, or high risk with a structured explanation.
  • Evidence retriever
    • Retrieves relevant policy documents, billing guidelines, and historical case notes for grounded decisions.
  • Audit logger
    • Stores every input, model output, retrieved document ID, and final decision for compliance review.
  • Human review handoff
    • Routes high-risk cases to investigators instead of auto-denying anything sensitive.

Implementation

1) Define the case schema and load your policy context

Use PydanticOutputParser so the model returns structured output instead of free-form text. In healthcare, that matters because downstream systems need deterministic fields for audit trails and investigator queues.

from typing import List
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser

class FraudAssessment(BaseModel):
    risk_level: str = Field(description="low, medium, or high")
    reasons: List[str] = Field(description="Short evidence-based reasons")
    recommended_action: str = Field(description="review, escalate, or approve")

parser = PydanticOutputParser(pydantic_object=FraudAssessment)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a healthcare fraud detection analyst. Use only the provided case data and policy context."),
    ("user", """
Case:
{case_json}

Policy context:
{policy_context}

Return your answer in this format:
{format_instructions}
""")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

2) Build a retrieval step for billing rules and prior guidance

Fraud decisions should be grounded in payer policy and coding guidance. FAISS plus RetrievalQA is fine for a first production path if your documents are clean and versioned.

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

docs = [
    Document(page_content="Modifier 25 should only be used when a significant separately identifiable E/M service is provided.", metadata={"doc_id": "policy_001"}),
    Document(page_content="Unbundling occurs when components of a procedure are billed separately without clinical justification.", metadata={"doc_id": "policy_002"}),
]

vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

3) Chain retrieval into a structured fraud assessment

Use LCEL composition with RunnablePassthrough, RunnableLambda, and the parser. This keeps the flow explicit and easier to audit than hiding logic inside one large chain.

import json
from langchain_core.runnables import RunnablePassthrough

def format_docs(retrieved_docs):
    return "\n".join(
        f"[{d.metadata['doc_id']}] {d.page_content}" for d in retrieved_docs
    )

def build_case_payload(case):
    return json.dumps(case, indent=2)

chain = (
    {
        "case_json": RunnableLambda(build_case_payload),
        "policy_context": RunnableLambda(lambda x: x["case"]) | retriever | RunnableLambda(format_docs),
        "format_instructions": RunnableLambda(lambda _: parser.get_format_instructions()),
    }
    | prompt
    | llm
    | parser
)

case = {
    "claim_id": "CLM-20491",
    "provider_npi": "1234567890",
    "member_id": "M-88321",
    "procedure_codes": ["99214", "99214"],
    "diagnosis_codes": ["E11.9"],
    "amount_billed": 840,
    "notes": "Two identical E/M visits billed same day with limited documentation.",
}

result = chain.invoke({"case": case})
print(result.model_dump())

4) Add a deterministic pre-check before the LLM

Do not send every claim straight to the model. Cheap rule checks catch obvious anomalies and reduce cost while improving control.

def rule_based_flags(case: dict) -> list[str]:
    flags = []
    if len(case.get("procedure_codes", [])) != len(set(case.get("procedure_codes", []))):
        flags.append("duplicate_procedure_code")
    if case.get("amount_billed", 0) > 500 and "99214" in case.get("procedure_codes", []):
        flags.append("high_value_e_and_m_pattern")
    return flags

flags = rule_based_flags(case)
if flags:
    print({"flags": flags})
else:
    print(chain.invoke({"case": case}).model_dump())

Production Considerations

  • Compliance controls
    • Treat claim payloads as regulated data. Enforce access control, encryption at rest/in transit, retention policies, and full audit logs with request IDs.
  • Data residency
    • Keep PHI in approved regions only. If your LLM provider cannot guarantee residency or BAA coverage, route only de-identified fields or use an approved private deployment.
  • Monitoring
    • Track false positives by provider specialty, CPT/HCPCS code family, and payer line of business. Fraud models drift fast when coding patterns change.
  • Human-in-the-loop review
    • Never auto-deny based on an LLM alone. High-risk outputs should create an investigator task with supporting evidence and source document IDs.

Common Pitfalls

  • Using raw PHI in prompts without minimization
    • Only send fields needed for detection. Strip names, addresses, MRNs, and free-text notes unless they materially affect the analysis.
  • Letting the model invent policy
    • If retrieval returns nothing relevant, fail closed to manual review. Do not accept unsupported reasoning from the model.
  • Skipping versioning on rules and documents
    • Fraud decisions need reproducibility. Version your prompt template, retrieval corpus, rule engine logic, and model name so you can explain why a case was flagged six months later.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides