How to Build a policy Q&A Agent Using LlamaIndex in Python for healthcare
A policy Q&A agent for healthcare answers questions like “Does this plan cover prior authorization for MRI scans?” or “What’s the appeal window for denied claims?” by retrieving the right policy documents, citing the source, and keeping responses constrained to approved content. That matters because healthcare teams need fast answers without guessing, and every answer has to survive compliance review, audit, and data residency constraints.
Architecture
- •
Policy document ingestion
- •Load PDFs, Word docs, HTML pages, and internal policy manuals into a normalized text format.
- •Keep document metadata like
policy_id,effective_date,jurisdiction, andsource_system.
- •
Chunking and indexing
- •Split policies into retrieval-friendly chunks with
SentenceSplitteror a similar node parser. - •Store embeddings in a vector index for semantic lookup across large policy libraries.
- •Split policies into retrieval-friendly chunks with
- •
Retriever layer
- •Use
VectorStoreIndex.as_retriever()to fetch the most relevant policy nodes. - •Add metadata filters for line of business, state, plan type, or effective date.
- •Use
- •
Response synthesis
- •Use an LLM-backed query engine to produce concise answers grounded in retrieved policy text.
- •Force citations so compliance reviewers can trace every statement back to source material.
- •
Guardrails and audit logging
- •Log the question, retrieved nodes, response, timestamps, and user identity.
- •Block unsupported medical advice and route clinical questions to approved workflows.
Implementation
- •Load policy documents with metadata
You want metadata on every document from day one. In healthcare, that metadata is not optional; it drives auditability, jurisdiction filtering, and retention policies.
from pathlib import Path
from llama_index.core import Document
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
def load_policy_docs(folder: str):
docs = []
for file_path in Path(folder).glob("*.txt"):
text = file_path.read_text(encoding="utf-8")
docs.append(
Document(
text=text,
metadata={
"source_file": file_path.name,
"policy_id": file_path.stem,
"domain": "healthcare",
"document_type": "policy",
"jurisdiction": "US",
},
)
)
return docs
documents = load_policy_docs("./policies")
- •Chunk and index the policies
Use a pipeline so ingestion is repeatable. That makes re-indexing after policy updates predictable and easier to test.
from llama_index.embeddings.openai import OpenAIEmbedding
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=512, chunk_overlap=64),
OpenAIEmbedding(model="text-embedding-3-small"),
]
)
nodes = pipeline.run(documents=documents)
index = VectorStoreIndex(nodes)
- •Build a query engine with citations
This is the core pattern: retrieve relevant chunks first, then synthesize an answer only from those chunks. In healthcare policy workflows, you want answers that cite the source text instead of free-form generation.
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o-mini", temperature=0)
query_engine = index.as_query_engine(
llm=llm,
similarity_top_k=4,
response_mode="compact",
)
question = "What is the prior authorization requirement for outpatient MRI?"
response = query_engine.query(question)
print(response.response)
for source in response.source_nodes:
print("SOURCE:", source.node.metadata["source_file"])
print("TEXT:", source.node.text[:300])
print("---")
- •Add metadata filtering for healthcare scope
If your policies vary by state or plan type, filter at retrieval time. This avoids mixing Medicare guidance with commercial plan rules or pulling in outdated state-specific language.
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
retriever = index.as_retriever(
similarity_top_k=5,
filters=MetadataFilters(
filters=[
ExactMatchFilter(key="domain", value="healthcare"),
ExactMatchFilter(key="jurisdiction", value="US"),
]
),
)
nodes = retriever.retrieve("appeal deadline for denied claims")
for node in nodes:
print(node.node.metadata)
print(node.node.text[:250])
Production Considerations
- •
Deploy inside your compliance boundary
- •Keep embeddings, indexes, logs, and LLM traffic inside approved infrastructure.
- •For PHI-adjacent workflows, verify vendor terms, BAA coverage, encryption at rest/in transit, and regional hosting requirements.
- •
Log every decision path
- •Store the user question, retrieved node IDs, document versions, model version, and final answer.
- •This gives you an audit trail when legal or compliance asks why the agent answered a certain way.
- •
Add hard guardrails
- •Refuse diagnosis requests, treatment recommendations, or anything that looks like clinical advice.
- •Route those questions to licensed staff or an approved clinical decision support system.
- •
Version policies aggressively
- •Policies change constantly. Index documents with
effective_dateandversion, then invalidate stale chunks on update. - •If you cannot prove which version answered the question, you do not have a production-grade healthcare system.
- •Policies change constantly. Index documents with
Common Pitfalls
- •
Mixing outdated policies with current ones
- •Problem: The retriever returns old plan rules because they still exist in the index.
- •Fix: Filter by
effective_date, archive old versions separately, and rebuild indexes on policy release events.
- •
Letting the model answer without citations
- •Problem: The agent produces plausible but untraceable answers.
- •Fix: Always expose
response.source_nodesor equivalent citation output in the UI and require grounded responses only.
- •
Ignoring healthcare-specific data controls
- •Problem: Teams accidentally send PHI-like content to external services without proper review.
- •Fix: Classify inputs before retrieval/LLM calls, redact sensitive fields where possible, and keep residency/compliance requirements explicit in your architecture.
If you build this pattern correctly—metadata-first ingestion, filtered retrieval, cited synthesis—you get a policy assistant that is actually usable by operations teams without turning every answer into a compliance risk.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit