How to Build a fraud detection Agent Using LlamaIndex in Python for healthcare
A fraud detection agent for healthcare reads claims, prior authorizations, encounter notes, billing events, and policy rules, then flags patterns that look inconsistent with medical necessity, duplicate billing, upcoding, or identity misuse. It matters because healthcare fraud drains reimbursement budgets, slows legitimate claims, and creates compliance risk when bad decisions are made without traceable evidence.
Architecture
- •
Data ingestion layer
- •Pulls structured sources like claims CSVs, EHR exports, and payer policy documents.
- •Uses
SimpleDirectoryReaderfor documents and custom loaders for internal systems.
- •
Indexing layer
- •Builds a
VectorStoreIndexover policy docs, coding guidelines, and historical fraud cases. - •Stores embeddings in a controlled backend that supports your residency requirements.
- •Builds a
- •
Retrieval layer
- •Uses
QueryEngineor a retriever to fetch the most relevant evidence for each claim. - •Returns source nodes so every alert is explainable.
- •Uses
- •
Fraud reasoning layer
- •A
ReActAgentor tool-based agent compares claim facts against retrieved policy context. - •Produces a structured risk assessment instead of a free-form opinion.
- •A
- •
Audit and observability layer
- •Logs prompts, retrieved sources, outputs, and final decisions.
- •Keeps an audit trail for compliance reviews and post-incident analysis.
Implementation
1) Install dependencies and load healthcare policy documents
Use LlamaIndex to index payer policies, coding guidance, and internal investigation playbooks. Keep PHI out of the first pass unless you have approved controls in place.
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# Load policy and guideline documents from a controlled folder
documents = SimpleDirectoryReader(
input_dir="./healthcare_docs",
recursive=True
).load_data()
# Build an in-memory index for development
index = VectorStoreIndex.from_documents(documents)
# Create a query engine for evidence retrieval
query_engine = index.as_query_engine(similarity_top_k=3)
This gives you a searchable knowledge base for questions like:
- •“What does this payer consider duplicate billing?”
- •“Which modifiers are required for telehealth claims?”
- •“What documentation is required for high-cost procedures?”
2) Define a fraud analysis tool that returns grounded evidence
The agent should not guess. It should retrieve supporting text first, then analyze whether a claim looks suspicious based on those sources.
from llama_index.core.tools import QueryEngineTool, ToolMetadata
fraud_policy_tool = QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name="fraud_policy_lookup",
description="Look up healthcare billing policies, coding rules, and fraud indicators."
)
)
Now the agent can use the tool to answer specific questions about a claim. This is the difference between a useful compliance assistant and an unsafe chatbot.
3) Create a ReAct agent that reasons over claim facts
Here’s the actual pattern: pass structured claim context into the prompt, let the agent retrieve policy evidence through the tool, then return a concise risk assessment with citations.
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o-mini", temperature=0)
agent = ReActAgent.from_tools(
tools=[fraud_policy_tool],
llm=llm,
verbose=True
)
claim_context = """
Claim ID: CLM-10492
Provider Type: Outpatient clinic
Service Date: 2025-02-11
CPT Codes: 99215, 93000
Modifiers: None
Diagnosis: E11.9
Notes: Follow-up visit billed as high complexity; same-day ECG also billed.
Concern: Possible upcoding or unbundling.
"""
response = agent.chat(
f"""
You are reviewing a healthcare claim for possible fraud or abuse.
Use only retrieved policy evidence plus the claim context below.
Claim context:
{claim_context}
Return:
1. Risk level: low/medium/high
2. Reasoning tied to policy evidence
3. Specific next investigation step
4. Source citations from retrieved context
"""
)
print(response)
A few things matter here:
- •
temperature=0keeps output stable for auditability. - •The prompt forces evidence-based reasoning.
- •The response should be reviewed by humans before any adverse action on payment or member access.
4) Add structured triage output for downstream systems
In production you usually want JSON-like output that your case management system can consume. LlamaIndex can be wrapped to produce consistent fields that investigators can route.
def triage_claim(agent, claim_context: str):
result = agent.chat(
f"""
Analyze this healthcare claim for potential fraud or abuse.
Return exactly these fields:
risk_level
rationale
recommended_action
Claim:
{claim_context}
"""
)
return str(result)
triage_result = triage_claim(agent, claim_context)
print(triage_result)
If you need stronger structure later, move this into an output parser or schema-backed workflow. Start simple, but keep the contract explicit from day one.
Production Considerations
- •
Compliance controls
- •Treat PHI as regulated data under HIPAA and your local privacy rules.
- •Minimize what gets sent to the model; redact identifiers unless they are necessary for adjudication.
- •
Auditability
- •Persist every claim input, retrieved node text, model response, and final investigator decision.
- •Store timestamps and model versions so you can reconstruct why an alert was raised.
- •
Data residency
- •Keep embeddings and vector stores in-region if your contracts require it.
- •Do not route claims data to external services without confirming where prompts and logs are stored.
- •
Human-in-the-loop review
- •Use the agent to prioritize cases, not to auto-deny claims.
- •High-risk outputs should trigger manual review by billing specialists or SIU staff.
Common Pitfalls
- •
Letting the agent infer fraud without evidence
- •Fix it by forcing retrieval before reasoning.
- •Require citations from
QueryEngineresults in every output.
- •
Indexing raw PHI without governance
- •Fix it by redacting unnecessary identifiers before ingestion.
- •Apply access controls to both source documents and vector storage.
- •
Using one generic prompt for all claim types
- •Fix it by separating workflows for inpatient claims, outpatient visits, DME, pharmacy benefits, and prior auths.
- •Fraud patterns differ by line of business; your prompts should reflect that reality.
- •
Skipping monitoring after deployment
- •Fix it by tracking false positives, investigator overrides, retrieval quality, and drift in coding rules.
- •Healthcare billing policies change often; stale indexes create bad alerts fast.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit