How to Build a fraud detection Agent Using LlamaIndex in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
fraud-detectionllamaindexpythonhealthcare

A fraud detection agent for healthcare reads claims, prior authorizations, encounter notes, billing events, and policy rules, then flags patterns that look inconsistent with medical necessity, duplicate billing, upcoding, or identity misuse. It matters because healthcare fraud drains reimbursement budgets, slows legitimate claims, and creates compliance risk when bad decisions are made without traceable evidence.

Architecture

  • Data ingestion layer

    • Pulls structured sources like claims CSVs, EHR exports, and payer policy documents.
    • Uses SimpleDirectoryReader for documents and custom loaders for internal systems.
  • Indexing layer

    • Builds a VectorStoreIndex over policy docs, coding guidelines, and historical fraud cases.
    • Stores embeddings in a controlled backend that supports your residency requirements.
  • Retrieval layer

    • Uses QueryEngine or a retriever to fetch the most relevant evidence for each claim.
    • Returns source nodes so every alert is explainable.
  • Fraud reasoning layer

    • A ReActAgent or tool-based agent compares claim facts against retrieved policy context.
    • Produces a structured risk assessment instead of a free-form opinion.
  • Audit and observability layer

    • Logs prompts, retrieved sources, outputs, and final decisions.
    • Keeps an audit trail for compliance reviews and post-incident analysis.

Implementation

1) Install dependencies and load healthcare policy documents

Use LlamaIndex to index payer policies, coding guidance, and internal investigation playbooks. Keep PHI out of the first pass unless you have approved controls in place.

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# Load policy and guideline documents from a controlled folder
documents = SimpleDirectoryReader(
    input_dir="./healthcare_docs",
    recursive=True
).load_data()

# Build an in-memory index for development
index = VectorStoreIndex.from_documents(documents)

# Create a query engine for evidence retrieval
query_engine = index.as_query_engine(similarity_top_k=3)

This gives you a searchable knowledge base for questions like:

  • “What does this payer consider duplicate billing?”
  • “Which modifiers are required for telehealth claims?”
  • “What documentation is required for high-cost procedures?”

2) Define a fraud analysis tool that returns grounded evidence

The agent should not guess. It should retrieve supporting text first, then analyze whether a claim looks suspicious based on those sources.

from llama_index.core.tools import QueryEngineTool, ToolMetadata

fraud_policy_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="fraud_policy_lookup",
        description="Look up healthcare billing policies, coding rules, and fraud indicators."
    )
)

Now the agent can use the tool to answer specific questions about a claim. This is the difference between a useful compliance assistant and an unsafe chatbot.

3) Create a ReAct agent that reasons over claim facts

Here’s the actual pattern: pass structured claim context into the prompt, let the agent retrieve policy evidence through the tool, then return a concise risk assessment with citations.

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini", temperature=0)

agent = ReActAgent.from_tools(
    tools=[fraud_policy_tool],
    llm=llm,
    verbose=True
)

claim_context = """
Claim ID: CLM-10492
Provider Type: Outpatient clinic
Service Date: 2025-02-11
CPT Codes: 99215, 93000
Modifiers: None
Diagnosis: E11.9
Notes: Follow-up visit billed as high complexity; same-day ECG also billed.
Concern: Possible upcoding or unbundling.
"""

response = agent.chat(
    f"""
You are reviewing a healthcare claim for possible fraud or abuse.
Use only retrieved policy evidence plus the claim context below.

Claim context:
{claim_context}

Return:
1. Risk level: low/medium/high
2. Reasoning tied to policy evidence
3. Specific next investigation step
4. Source citations from retrieved context
"""
)

print(response)

A few things matter here:

  • temperature=0 keeps output stable for auditability.
  • The prompt forces evidence-based reasoning.
  • The response should be reviewed by humans before any adverse action on payment or member access.

4) Add structured triage output for downstream systems

In production you usually want JSON-like output that your case management system can consume. LlamaIndex can be wrapped to produce consistent fields that investigators can route.

def triage_claim(agent, claim_context: str):
    result = agent.chat(
        f"""
        Analyze this healthcare claim for potential fraud or abuse.
        Return exactly these fields:
        risk_level
        rationale
        recommended_action

        Claim:
        {claim_context}
        """
    )
    return str(result)

triage_result = triage_claim(agent, claim_context)
print(triage_result)

If you need stronger structure later, move this into an output parser or schema-backed workflow. Start simple, but keep the contract explicit from day one.

Production Considerations

  • Compliance controls

    • Treat PHI as regulated data under HIPAA and your local privacy rules.
    • Minimize what gets sent to the model; redact identifiers unless they are necessary for adjudication.
  • Auditability

    • Persist every claim input, retrieved node text, model response, and final investigator decision.
    • Store timestamps and model versions so you can reconstruct why an alert was raised.
  • Data residency

    • Keep embeddings and vector stores in-region if your contracts require it.
    • Do not route claims data to external services without confirming where prompts and logs are stored.
  • Human-in-the-loop review

    • Use the agent to prioritize cases, not to auto-deny claims.
    • High-risk outputs should trigger manual review by billing specialists or SIU staff.

Common Pitfalls

  1. Letting the agent infer fraud without evidence

    • Fix it by forcing retrieval before reasoning.
    • Require citations from QueryEngine results in every output.
  2. Indexing raw PHI without governance

    • Fix it by redacting unnecessary identifiers before ingestion.
    • Apply access controls to both source documents and vector storage.
  3. Using one generic prompt for all claim types

    • Fix it by separating workflows for inpatient claims, outpatient visits, DME, pharmacy benefits, and prior auths.
    • Fraud patterns differ by line of business; your prompts should reflect that reality.
  4. Skipping monitoring after deployment

    • Fix it by tracking false positives, investigator overrides, retrieval quality, and drift in coding rules.
    • Healthcare billing policies change often; stale indexes create bad alerts fast.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides