How to Build a compliance checking Agent Using LlamaIndex in Python for healthcare
A healthcare compliance checking agent reviews clinical, operational, or patient-facing text and flags potential violations against policy, regulation, and internal SOPs. In practice, that means catching PHI exposure, missing consent language, unsafe claims, retention issues, and policy drift before the content ships or a workflow continues.
Architecture
- •
Policy corpus
- •Source documents for HIPAA policies, internal security standards, retention rules, consent templates, and approved wording.
- •Store them as versioned files so every answer can be traced to a specific policy revision.
- •
Document ingestion layer
- •Load PDFs, DOCX, and text into LlamaIndex using
SimpleDirectoryReader. - •Chunk with
SentenceSplitterso retrieval returns policy clauses instead of giant blobs.
- •Load PDFs, DOCX, and text into LlamaIndex using
- •
Index and retriever
- •Build a
VectorStoreIndexover the policy corpus. - •Use a retriever to pull the most relevant clauses for each compliance check.
- •Build a
- •
Compliance reasoning engine
- •Use an LLM-backed query engine with a strict prompt that asks for violations, risk level, and cited policy snippets.
- •Keep output structured so downstream systems can route high-risk findings.
- •
Audit and logging layer
- •Persist the input text, retrieved policy chunks, model response, timestamps, and policy version.
- •Healthcare teams need this for incident review and regulatory audits.
- •
Guardrail layer
- •Add PHI redaction before logging.
- •Block unsupported advice and route ambiguous cases to human review.
Implementation
1) Install dependencies and load your policy documents
Start with a local Python environment. For healthcare workloads, keep policy docs in a controlled directory and avoid mixing them with patient data.
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
from pathlib import Path
from llama_index.core import SimpleDirectoryReader
policy_dir = Path("./healthcare_policies")
documents = SimpleDirectoryReader(
input_dir=str(policy_dir),
required_exts=[".txt", ".md", ".pdf"]
).load_data()
print(f"Loaded {len(documents)} policy documents")
2) Build a retrieval index over compliance policies
Use chunking that preserves clause-level context. For compliance checks, smaller chunks usually work better than large semantic sections because you want exact citations.
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=80)
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k=4)
3) Create a compliance checker query engine
The key is to force the model into a narrow task: identify violations only against retrieved policy context. Ask for structured output that includes severity and citations.
from llama_index.llms.openai import OpenAI
from llama_index.core.prompts import PromptTemplate
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
compliance_prompt = PromptTemplate(
"""You are a healthcare compliance reviewer.
Review the user content against the retrieved policy context only.
Return:
1. verdict: compliant | non_compliant | needs_review
2. severity: low | medium | high
3. findings: bullet list of issues
4. citations: exact policy snippets used
Policy context:
{context_str}
User content:
{query_str}
"""
)
query_engine = index.as_query_engine(
llm=Settings.llm,
text_qa_template=compliance_prompt,
similarity_top_k=4,
)
Now run checks on content from a clinical workflow, patient portal draft, or support message.
sample_text = """
Please email me the patient's full chart and diagnosis history.
We can skip consent since this is urgent.
"""
response = query_engine.query(sample_text)
print(response)
4) Add audit logging and PHI-safe handling
Do not log raw patient data into application logs. Redact identifiers first, then store the check result with document version metadata.
import re
import json
from datetime import datetime
def redact_phi(text: str) -> str:
text = re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[REDACTED_SSN]", text)
text = re.sub(r"\b\d{10}\b", "[REDACTED_PHONE]", text)
return text
def log_compliance_check(input_text: str, result_text: str):
record = {
"timestamp": datetime.utcnow().isoformat(),
"input_text_redacted": redact_phi(input_text),
"result": str(result_text),
"policy_version": "2026-04",
"system": "healthcare-compliance-agent",
}
with open("compliance_audit_log.jsonl", "a") as f:
f.write(json.dumps(record) + "\n")
log_compliance_check(sample_text, response)
Production Considerations
- •
Deploy inside your healthcare boundary
- •Keep embeddings, vector stores, logs, and LLM calls in approved regions.
- •If you use managed APIs, confirm data residency terms match your regulatory requirements.
- •
Make every decision auditable
- •Store retrieved node IDs, source filenames, timestamps, prompt version, model name, and final verdict.
- •Auditors will ask why the agent flagged something; raw “the model said so” is not enough.
- •
Add guardrails for PHI
- •Redact logs before persistence.
- •Reject requests that try to extract full records unless the caller has verified authorization.
- •
Route uncertain cases to humans
- •If retrieval confidence is low or the model returns
needs_review, send it to compliance staff. - •In healthcare, false negatives are worse than noisy escalations.
- •If retrieval confidence is low or the model returns
Common Pitfalls
- •
Using general-purpose prompts without grounding in policy
- •If you ask the model “is this compliant?” with no retrieved policy context, it will improvise.
- •Fix it by forcing retrieval-first workflows through
VectorStoreIndexand a strict template that references only retrieved clauses.
- •
Logging raw clinical or patient content
- •Teams often save inputs for debugging and accidentally create an extra PHI system of record.
- •Fix it by redacting before logging and keeping audit records minimal but sufficient for traceability.
- •
Treating compliance as binary when it’s actually risk-based
- •Some issues are clear violations; others depend on authorization scope or workflow context.
- •Fix it by returning
compliant,non_compliant, orneeds_review, plus severity and citations.
A good healthcare compliance agent is not just a classifier. It is a retrieval-backed control point that produces explainable decisions, keeps audit trails clean, respects data residency constraints, and knows when to stop short of making an autonomous call.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit