How to Build a claims processing Agent Using LlamaIndex in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
claims-processingllamaindexpythonhealthcare

A claims processing agent for healthcare takes an incoming claim, extracts the relevant clinical and billing facts, checks them against policy rules, and prepares a decision package for a human reviewer or downstream adjudication system. It matters because healthcare claims are expensive to process manually, slow to resolve, and full of edge cases where compliance, auditability, and data handling rules are non-negotiable.

Architecture

  • Claim intake layer

    • Receives PDFs, EOBs, CMS-1500/UB-04 forms, faxes, or structured JSON from upstream systems.
    • Normalizes input into text plus metadata like member ID, provider ID, service date, and claim type.
  • Document indexing layer

    • Uses LlamaIndex to store claim documents, policy manuals, payer rules, and prior adjudication notes.
    • Enables retrieval over both the current claim and supporting policy context.
  • Extraction and reasoning layer

    • Pulls out entities such as diagnosis codes, procedure codes, modifiers, dates of service, and denial reasons.
    • Produces a structured decision object instead of free-form text.
  • Policy retrieval layer

    • Retrieves the exact coverage policy or billing rule relevant to the claim.
    • Keeps the agent grounded in payer-specific logic rather than model memory.
  • Decision orchestration layer

    • Combines extraction results, retrieved policy snippets, and business rules.
    • Routes low-risk claims to auto-adjudication and ambiguous claims to manual review.
  • Audit and observability layer

    • Stores prompts, retrieved nodes, outputs, confidence scores, and final decisions.
    • Supports compliance review, dispute resolution, and post-incident analysis.

Implementation

1. Ingest claim documents into a LlamaIndex index

For healthcare workflows, keep claim artifacts separate from policy documents. You want retrieval over the right corpus with metadata that supports audit trails and data residency controls.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.schema import Document

claim_docs = [
    Document(
        text="""
        Claim ID: CLM-10021
        Member: John Doe
        DOS: 2024-10-12
        CPT: 99213
        ICD-10: I10
        Provider: North Clinic
        Notes: Office visit for hypertension follow-up.
        """,
        metadata={
            "claim_id": "CLM-10021",
            "source": "claims_intake",
            "region": "us-east-1",
            "phi": True,
        },
    )
]

policy_docs = SimpleDirectoryReader("./payer_policies").load_data()

claims_index = VectorStoreIndex.from_documents(claim_docs)
policy_index = VectorStoreIndex.from_documents(policy_docs)

2. Build a query engine for policy-grounded answers

Use retrieval to answer narrow questions like “is CPT 99213 covered for this diagnosis under this plan?” The point is not open-ended chat; it is evidence-backed adjudication support.

from llama_index.core import Settings

Settings.chunk_size = 512

policy_query_engine = policy_index.as_query_engine(
    similarity_top_k=3,
    response_mode="compact",
)

response = policy_query_engine.query(
    "For CPT 99213 with ICD-10 I10 on an outpatient office visit,
     what coverage or documentation requirements apply?"
)

print(response)

3. Extract a structured claim decision with an LLM-backed workflow

The agent should return structured output that downstream systems can validate. In LlamaIndex you can use as_query_engine() for grounded retrieval and then transform the result into your own schema.

from pydantic import BaseModel, Field
from typing import List
from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI

class ClaimDecision(BaseModel):
    claim_id: str = Field(...)
    status: str = Field(..., description="approved | denied | needs_review")
    reason_codes: List[str] = Field(default_factory=list)
    evidence: List[str] = Field(default_factory=list)

llm = OpenAI(model="gpt-4o-mini", temperature=0)

prompt = f"""
You are assisting with healthcare claims triage.
Use only the provided policy evidence.
Return JSON with keys: claim_id,status,reason_codes,evidence.

Claim:
Claim ID CLM-10021
CPT 99213
ICD-10 I10
DOS 2024-10-12

Policy evidence:
{response}
"""

result = llm.complete(prompt)
print(result.text)

4. Add a human-review gate for risky cases

In healthcare you do not want the model making final determinations on unclear medical necessity or missing documentation. Route anything below threshold confidence or with conflicting evidence to review.

def route_claim(decision_text: str) -> str:
    lowered = decision_text.lower()
    if "needs_review" in lowered or "missing" in lowered or "conflict" in lowered:
        return "human_review"
    return "auto_adjudicate"

route = route_claim(result.text)
print(route)

A practical pattern is:

StepInputOutput
RetrieveClaim + policy corpusEvidence snippets
ExtractEvidence + claim fieldsStructured decision
ValidateDecision JSON + business rulesRoute
PersistPrompt + evidence + outputAudit record

Production Considerations

  • Compliance controls

    • Treat PHI as sensitive by default.
    • Encrypt data at rest and in transit.
    • Restrict model access to minimum necessary fields under HIPAA policies.
  • Auditability

    • Store retrieved chunks, prompt versions, model version, timestamp, and final disposition.
    • Keep immutable logs so every denial or approval can be traced back to source evidence.
  • Data residency

    • Pin storage and inference to approved regions such as us-east-1 if your contracts require it.
    • Do not send PHI to third-party endpoints outside approved jurisdictions.
  • Guardrails

    • Use deterministic temperature settings for adjudication support.
    • Block free-form medical advice generation.
    • Require structured outputs that can be validated against schema before any downstream action.

Common Pitfalls

  1. Using one index for everything

    • Mixing claims data with payer policies makes retrieval noisy and increases hallucination risk.
    • Keep separate indexes for intake documents, policies, appeals history, and provider notes.
  2. Letting the model decide without evidence

    • If the agent returns “denied” without citing specific policy text or claim fields, you cannot defend the outcome.
    • Always require retrieved evidence in the output payload.
  3. Ignoring PHI handling in logs

    • Developers often log prompts and responses verbatim during debugging.
    • Redact member identifiers, diagnoses where required by policy, and any free-text clinical notes before writing logs outside secure storage.

A claims agent built this way does not replace adjudication staff. It removes repetitive reading and routing work while preserving the controls healthcare teams need: compliance boundaries, traceable decisions, and region-aware data handling.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides