How to Build a transaction monitoring Agent Using LlamaIndex in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21
transaction-monitoringllamaindexpythonhealthcare

A transaction monitoring agent for healthcare watches claims, payments, refunds, eligibility changes, and patient account activity for suspicious patterns, policy violations, and operational anomalies. It matters because healthcare data is regulated, billing is high-volume, and bad transactions can mean fraud, compliance exposure, denied claims, or patient trust issues.

Architecture

  • Data ingestion layer

    • Pulls transaction events from EHR billing exports, payment systems, claim feeds, or a message queue.
    • Normalizes records into a consistent schema: transaction_id, patient_id, provider_id, amount, timestamp, type, payer, location.
  • Document store / vector index

    • Stores policy docs, billing rules, payer guidelines, and historical incident notes.
    • Uses llama_index.core.VectorStoreIndex to retrieve relevant context for each transaction.
  • Monitoring agent

    • Uses an LLM-backed query engine to classify transactions as normal, suspicious, or needs review.
    • Uses llama_index.core.agent.ReActAgent or a tool-based workflow when you need multi-step reasoning.
  • Rules + retrieval layer

    • Applies deterministic checks first: amount thresholds, duplicate claims, out-of-network patterns, unusual refund frequency.
    • Retrieves supporting policy snippets with LlamaIndex so the model explains decisions against actual rules.
  • Audit and case output

    • Writes every decision with the input features, retrieved context, model output, and confidence.
    • Healthcare teams need this for HIPAA audits, internal compliance reviews, and appeal workflows.

Implementation

1) Install dependencies and load healthcare policy documents

Use LlamaIndex to index your internal policies and payer guidance. Keep these documents separate from raw PHI; the index should hold policy text, not patient records.

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
import os
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# Policy docs only: billing rules, fraud SOPs, payer guidelines
docs = SimpleDirectoryReader("./healthcare_policies").load_data()

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=3)

response = query_engine.query(
    "What are the red flags for duplicate outpatient claims?"
)

print(response)

This gives the agent retrieval grounding before it makes a decision. In healthcare monitoring, that grounding matters because you want decisions tied to documented policy instead of free-form model guesses.

2) Build a transaction risk classifier with structured output

For production monitoring you want structured outputs. LlamaIndex’s PydanticOutputParser works well when you need JSON-like results that downstream systems can store in an audit log or case management system.

from enum import Enum
from pydantic import BaseModel, Field
from typing import List

from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.llms.openai import OpenAI

class RiskLevel(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"

class MonitoringResult(BaseModel):
    risk_level: RiskLevel = Field(...)
    reasons: List[str] = Field(...)
    recommended_action: str = Field(...)
    policy_refs: List[str] = Field(...)

parser = PydanticOutputParser(output_cls=MonitoringResult)
llm = OpenAI(model="gpt-4o-mini", temperature=0)

def assess_transaction(txn: dict) -> MonitoringResult:
    prompt = f"""
You are a healthcare transaction monitoring agent.
Classify this transaction using the policy context below.

Transaction:
{txn}

Return only valid structured output.
{parser.format_string()}
"""
    raw = llm.complete(prompt)
    return parser.parse(str(raw))

txn = {
    "transaction_id": "TXN-10091",
    "patient_id": "P-8821",
    "provider_id": "PRV-44",
    "amount": 1840.00,
    "type": "claim_adjustment",
    "payer": "Commercial",
    "location": "out_of_network"
}

result = assess_transaction(txn)
print(result.model_dump())

This pattern is useful because it forces consistent outputs for alerting pipelines. If you are sending cases into ServiceNow or Jira later on, structured data saves cleanup work.

3) Add retrieval-backed explanations for flagged transactions

The agent should explain why a transaction was flagged using retrieved policy text. That keeps analysts from treating the LLM output like magic.

def explain_transaction(txn: dict) -> str:
    prompt = f"""
Review this healthcare transaction:
{txn}

Use the retrieved policy context to explain whether this should be flagged.
Focus on compliance concerns like duplicate billing,
out-of-network exceptions, unusual refund patterns,
and documentation requirements.
"""
    return str(query_engine.query(prompt))

explanation = explain_transaction(txn)
print(explanation)

In practice you would combine this with rule checks first. For example:

  • Flag if amount exceeds a provider-specific threshold.
  • Flag if the same patient/provider pair has repeated reversals in a short window.
  • Flag if location conflicts with payer network rules.

Then use LlamaIndex to generate the explanation only after a rule fires or the score crosses a threshold.

4) Wrap it as an agent with tools

If you want the model to choose between querying policies and inspecting transaction metadata dynamically, use ReActAgent. This is better than stuffing everything into one prompt.

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

policy_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="healthcare_policy_search",
        description="Searches billing and compliance policies for relevant guidance."
    ),
)

agent = ReActAgent.from_tools(
    tools=[policy_tool],
    llm=llm,
    verbose=True,
)

response = agent.chat(
    f"""
Analyze this transaction for monitoring:
{txn}

Decide if it should be escalated and cite applicable policy language.
"""
)

print(response)

This gives you a real agent pattern: one tool for retrieval now, more tools later for provider history lookups or claims velocity checks. Keep PHI access outside the LLM where possible; pass only the minimum fields needed for analysis.

Production Considerations

  • Deployment

    • Keep PHI in your private network boundary.
    • Use a VPC-hosted vector store if your policies or embeddings contain regulated data.
    • Enforce regional deployment for data residency requirements if your healthcare org operates across jurisdictions.
  • Monitoring

    • Log every alert with input features, retrieved policy chunks, model version, and final disposition.
    • Track false positives by provider group and transaction type.
    • Add drift monitoring on alert volume; spikes often mean upstream billing changes or payer rule changes.
  • Guardrails

    • Redact patient identifiers before sending text to the model when possible.
    • Use deterministic thresholds before LLM judgment for high-risk categories like refunds or write-offs.
    • Require human review for high-severity alerts; do not auto-deny claims based on model output alone.
  • Compliance

    • Treat prompts and outputs as audit artifacts.
    • Encrypt at rest and in transit.
    • Maintain retention policies aligned with HIPAA and local medical record regulations.

Common Pitfalls

  1. Putting raw PHI into prompts

    • Avoid sending names, DOBs, full addresses, or diagnosis details unless absolutely necessary.
    • Use tokenized identifiers and masked fields instead.
  2. Using retrieval without versioned policies

    • If your policy docs change weekly and you do not version them, audits become messy fast.
    • Store document version IDs alongside every alert so analysts know what rule set was active.
  3. Letting the LLM make final financial decisions

    • The model should recommend escalation or review status.
    • Final disposition should come from rules plus human review when money movement or claim denial is involved.
  4. Skipping analyst feedback loops

    • If reviewers override alerts but that feedback never reaches the system, false positives stay high.
    • Feed dispositions back into your thresholds and prompt templates so the agent improves over time.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides