How to Build a policy Q&A Agent Using LlamaIndex in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21

policy-q-allamaindexpythonhealthcarepolicy-qanda

A policy Q&A agent for healthcare answers questions like “Does this plan cover prior authorization for MRI scans?” or “What’s the appeal window for denied claims?” by retrieving the right policy documents, citing the source, and keeping responses constrained to approved content. That matters because healthcare teams need fast answers without guessing, and every answer has to survive compliance review, audit, and data residency constraints.

Architecture

•
Policy document ingestion
- •Load PDFs, Word docs, HTML pages, and internal policy manuals into a normalized text format.
- •Keep document metadata like policy_id, effective_date, jurisdiction, and source_system.
•
Chunking and indexing
- •Split policies into retrieval-friendly chunks with SentenceSplitter or a similar node parser.
- •Store embeddings in a vector index for semantic lookup across large policy libraries.
•
Retriever layer
- •Use VectorStoreIndex.as_retriever() to fetch the most relevant policy nodes.
- •Add metadata filters for line of business, state, plan type, or effective date.
•
Response synthesis
- •Use an LLM-backed query engine to produce concise answers grounded in retrieved policy text.
- •Force citations so compliance reviewers can trace every statement back to source material.
•
Guardrails and audit logging
- •Log the question, retrieved nodes, response, timestamps, and user identity.
- •Block unsupported medical advice and route clinical questions to approved workflows.

Implementation

•Load policy documents with metadata

You want metadata on every document from day one. In healthcare, that metadata is not optional; it drives auditability, jurisdiction filtering, and retention policies.

from pathlib import Path
from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

def load_policy_docs(folder: str):
    docs = []
    for file_path in Path(folder).glob("*.txt"):
        text = file_path.read_text(encoding="utf-8")
        docs.append(
            Document(
                text=text,
                metadata={
                    "source_file": file_path.name,
                    "policy_id": file_path.stem,
                    "domain": "healthcare",
                    "document_type": "policy",
                    "jurisdiction": "US",
                },
            )
        )
    return docs

documents = load_policy_docs("./policies")

•Chunk and index the policies

Use a pipeline so ingestion is repeatable. That makes re-indexing after policy updates predictable and easier to test.

from llama_index.embeddings.openai import OpenAIEmbedding

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=64),
        OpenAIEmbedding(model="text-embedding-3-small"),
    ]
)

nodes = pipeline.run(documents=documents)
index = VectorStoreIndex(nodes)

•Build a query engine with citations

This is the core pattern: retrieve relevant chunks first, then synthesize an answer only from those chunks. In healthcare policy workflows, you want answers that cite the source text instead of free-form generation.

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini", temperature=0)

query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=4,
    response_mode="compact",
)

question = "What is the prior authorization requirement for outpatient MRI?"
response = query_engine.query(question)

print(response.response)
for source in response.source_nodes:
    print("SOURCE:", source.node.metadata["source_file"])
    print("TEXT:", source.node.text[:300])
    print("---")

•Add metadata filtering for healthcare scope

If your policies vary by state or plan type, filter at retrieval time. This avoids mixing Medicare guidance with commercial plan rules or pulling in outdated state-specific language.

from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

retriever = index.as_retriever(
    similarity_top_k=5,
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(key="domain", value="healthcare"),
            ExactMatchFilter(key="jurisdiction", value="US"),
        ]
    ),
)

nodes = retriever.retrieve("appeal deadline for denied claims")
for node in nodes:
    print(node.node.metadata)
    print(node.node.text[:250])

Production Considerations

•
Deploy inside your compliance boundary
- •Keep embeddings, indexes, logs, and LLM traffic inside approved infrastructure.
- •For PHI-adjacent workflows, verify vendor terms, BAA coverage, encryption at rest/in transit, and regional hosting requirements.
•
Log every decision path
- •Store the user question, retrieved node IDs, document versions, model version, and final answer.
- •This gives you an audit trail when legal or compliance asks why the agent answered a certain way.
•
Add hard guardrails
- •Refuse diagnosis requests, treatment recommendations, or anything that looks like clinical advice.
- •Route those questions to licensed staff or an approved clinical decision support system.
•
Version policies aggressively
- •Policies change constantly. Index documents with effective_date and version, then invalidate stale chunks on update.
- •If you cannot prove which version answered the question, you do not have a production-grade healthcare system.

Common Pitfalls

•
Mixing outdated policies with current ones
- •Problem: The retriever returns old plan rules because they still exist in the index.
- •Fix: Filter by effective_date, archive old versions separately, and rebuild indexes on policy release events.
•
Letting the model answer without citations
- •Problem: The agent produces plausible but untraceable answers.
- •Fix: Always expose response.source_nodes or equivalent citation output in the UI and require grounded responses only.
•
Ignoring healthcare-specific data controls
- •Problem: Teams accidentally send PHI-like content to external services without proper review.
- •Fix: Classify inputs before retrieval/LLM calls, redact sensitive fields where possible, and keep residency/compliance requirements explicit in your architecture.

If you build this pattern correctly—metadata-first ingestion, filtered retrieval, cited synthesis—you get a policy assistant that is actually usable by operations teams without turning every answer into a compliance risk.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit