How to Build a policy Q&A Agent Using LangChain in Python for healthcare

By Cyprian AaronsUpdated 2026-04-21

policy-q-alangchainpythonhealthcarepolicy-qanda

A policy Q&A agent for healthcare answers questions like “Can this claim be auto-approved?” or “What does our prior authorization policy say for MRI referrals?” It matters because policy lookup is one of the highest-volume, highest-friction tasks in payer and provider operations, and getting it wrong creates compliance risk, delays care, and increases manual review load.

Architecture

•
Policy document store
- •Source PDFs, HTML pages, SOPs, clinical policy bulletins, and internal memos.
- •Keep document versioning so the agent can cite the exact policy revision.
•
Ingestion and chunking pipeline
- •Use PyPDFLoader, WebBaseLoader, or custom loaders.
- •Split with RecursiveCharacterTextSplitter so retrieval works on policy sections instead of whole documents.
•
Vector index
- •Store embeddings in a retriever-backed vector DB such as FAISS for local deployments or a managed store if your residency rules allow it.
- •The retriever should return only the top relevant chunks with metadata like policy ID, effective date, and jurisdiction.
•
LLM answer chain
- •Use LangChain’s create_retrieval_chain with a document-combining chain such as create_stuff_documents_chain.
- •Force grounded answers: the model should answer only from retrieved policy text.
•
Guardrails and audit logging
- •Add a refusal path when evidence is weak.
- •Log query, retrieved docs, answer, timestamps, user role, and policy version for auditability.

Implementation

1) Install the core packages

You want a minimal stack that is easy to deploy inside a controlled environment. For healthcare workloads, prefer libraries you can pin and scan.

pip install langchain langchain-community langchain-openai faiss-cpu pypdf

If you run on Azure OpenAI or another approved provider, swap the model package accordingly. The pattern below stays the same.

2) Load policies and build a retriever

This example loads PDFs from disk, chunks them, embeds them, and builds a FAISS retriever. In production, replace the file path with your governed document source and attach metadata during ingestion.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

pdf_paths = [
    "policies/prior_authorization.pdf",
    "policies/medical_necessity.pdf",
]

docs = []
for path in pdf_paths:
    loader = PyPDFLoader(path)
    loaded_docs = loader.load()
    for d in loaded_docs:
        d.metadata["source_file"] = path
    docs.extend(loaded_docs)

splitter = RecursiveCharacterTextSplitter(
    chunk_size=900,
    chunk_overlap=150,
)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

3) Build the grounded Q&A chain

Use ChatOpenAI, ChatPromptTemplate, create_stuff_documents_chain, and create_retrieval_chain. The prompt explicitly tells the model to stay inside retrieved policy text and to say when evidence is missing.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    (
        "system",
        """You are a healthcare policy assistant.
Answer only using the provided context.
If the context does not contain enough information, say:
"I could not find sufficient policy evidence to answer this."
Cite the source file and relevant section if available.
Do not provide medical advice."""
    ),
    ("human", "{input}\n\nContext:\n{context}")
])

combine_docs_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(retriever, combine_docs_chain)

question = "Does this plan require prior authorization for outpatient MRI?"
result = qa_chain.invoke({"input": question})

print(result["answer"])

That pattern is the core of a production policy assistant. The retriever supplies evidence, and the LLM turns that evidence into a concise response.

4) Add lightweight audit logging

Healthcare teams need traceability. Log what was asked, which documents were retrieved, and what answer was returned. Keep PHI out of logs unless your controls explicitly allow it.

import json
from datetime import datetime

def ask_policy(question: str):
    result = qa_chain.invoke({"input": question})
    payload = {
        "timestamp": datetime.utcnow().isoformat(),
        "question": question,
        "answer": result["answer"],
        "sources": [
            {
                "source_file": doc.metadata.get("source_file"),
                "page": doc.metadata.get("page"),
            }
            for doc in result.get("context", [])
        ],
    }
    print(json.dumps(payload, indent=2))
    return result["answer"]

ask_policy("What is the appeal window for denied claims?")

Production Considerations

•
Compliance controls
- •Treat prompts and retrieved text as regulated operational data.
- •Apply access control by user role so staff only see policies they are authorized to use.
- •If answers may include PHI-adjacent data, run under your HIPAA security review process.
•
Auditability
- •Persist query logs with document IDs, chunk hashes, model version, and retrieval scores.
- •This makes it possible to reconstruct why an answer was produced during an internal review or external audit.
•
Data residency
- •Keep embeddings, vector stores, logs, and model calls inside approved regions.
- •For strict residency requirements, use self-hosted models or approved cloud endpoints only.
•
Guardrails
- •Refuse to answer when retrieval confidence is low or conflicting policies are returned.
- •Require citations in every response.
- •Route ambiguous questions to human review instead of guessing.

Common Pitfalls

•
Using raw PDFs without chunking
- •Whole-policy retrieval gives noisy results and long contexts.
- •Fix it with RecursiveCharacterTextSplitter and metadata-rich chunks.
•
Letting the model answer without grounding
- •A generic chat prompt will hallucinate policy details.
- •Fix it by using create_retrieval_chain plus an instruction to answer only from context.
•
Ignoring versioning and jurisdiction
- •Healthcare policies vary by line of business, state, plan year, and effective date.
- •Fix it by storing metadata on every chunk and filtering retrieval by those fields before generation.

If you build this right, the agent becomes a controlled lookup layer over your policy corpus rather than a free-form chatbot. That is the difference between something useful in healthcare operations and something that creates compliance work.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit