How to Integrate Haystack for healthcare with Elasticsearch for RAG
Healthcare RAG lives or dies on retrieval quality and traceability. Haystack for healthcare gives you the pipeline primitives for clinical document processing, while Elasticsearch gives you fast, scalable retrieval over structured and unstructured medical content. Put them together and you can build an AI agent that answers policy, triage, coding, or care-path questions from internal medical knowledge without hand-waving over source grounding.
Prerequisites
- •Python 3.10+
- •Access to a running Elasticsearch cluster
- •An index in Elasticsearch with fields for:
- •
contentor similar text field - •metadata fields like
patient_id,doc_type,source,created_at
- •
- •Haystack installed with the Elasticsearch integration
- •A healthcare-focused Haystack setup, including your domain document loaders/processors
- •API credentials if your Elasticsearch cluster is managed or secured
Install the packages:
pip install haystack-ai elasticsearch
If you are using a healthcare-specific Haystack package or extension in your environment, install that too according to your internal distribution.
Integration Steps
- •Connect to Elasticsearch
Start by creating a client and verifying the cluster is reachable. In production, use TLS and authenticated access.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"https://localhost:9200",
basic_auth=("elastic", "changeme"),
verify_certs=True,
)
print(es.info())
For RAG, keep the index mapping explicit. Clinical search works better when you separate text content from metadata filters.
index_name = "healthcare-rag"
if not es.indices.exists(index=index_name):
es.indices.create(
index=index_name,
mappings={
"properties": {
"content": {"type": "text"},
"patient_id": {"type": "keyword"},
"doc_type": {"type": "keyword"},
"source": {"type": "keyword"},
}
},
)
- •Prepare healthcare documents in Haystack
Haystack’s Document object is the unit you will pass into indexing and retrieval. For healthcare use cases, keep PHI handling outside this step unless your governance model allows it.
from haystack import Document
documents = [
Document(
content="Patient discharged with CHF follow-up in 7 days. Continue furosemide 40mg daily.",
meta={"patient_id": "p-1001", "doc_type": "discharge_summary", "source": "ehr"}
),
Document(
content="Clinical note: elevated HbA1c indicates poor glycemic control. Recommend diet review.",
meta={"patient_id": "p-1002", "doc_type": "progress_note", "source": "ehr"}
),
]
If you already have a healthcare ingestion pipeline, this is where parsed notes, PDFs, or claims text should enter before indexing.
- •Index documents into Elasticsearch using Haystack
Haystack provides an Elasticsearch-backed document store. Use it as the bridge between your healthcare documents and retrieval layer.
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts=["https://localhost:9200"],
basic_auth=("elastic", "changeme"),
index=index_name,
embedding_dim=384,
)
document_store.write_documents(documents)
If your setup uses embeddings for semantic RAG, generate embeddings before writing or configure a retriever later that handles them. For hybrid search in clinical systems, keep lexical search available because medication names and abbreviations often match better than dense vectors alone.
- •Build a retriever for RAG queries
Use Haystack’s retriever connected to the same document store. This lets your agent fetch relevant medical context before generating an answer.
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchEmbeddingRetriever
retriever = ElasticsearchEmbeddingRetriever(document_store=document_store)
query = "What is the recommended follow-up after CHF discharge?"
results = retriever.run(query=query, top_k=3)
for doc in results["documents"]:
print(doc.content)
print(doc.meta)
If you need strict filtering by patient or encounter, add metadata filters at query time so your agent only sees authorized context.
- •Wire retrieval into an agent workflow
In a production RAG system, retrieval should happen before generation. The pattern is: user query -> retrieve from Elasticsearch -> pass context to your LLM prompt -> return answer with citations.
from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
template = """
Answer the question using only the provided clinical context.
Question: {{question}}
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Answer:
"""
prompt_builder = PromptBuilder(template=template)
pipe = Pipeline()
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", prompt_builder)
pipe.connect("retriever.documents", "prompt_builder.documents")
response = pipe.run({
"retriever": {"query": query},
"prompt_builder": {"question": query},
})
print(response["prompt_builder"]["prompt"])
At this point you have the core integration: clinical documents live in Elasticsearch, Haystack retrieves them, and your agent can generate grounded responses from those retrieved passages.
Testing the Integration
Run a simple end-to-end check: write one document, retrieve it, and confirm the expected text comes back.
test_query = "CHF follow-up"
test_results = retriever.run(query=test_query, top_k=1)
assert len(test_results["documents"]) > 0
print(test_results["documents"][0].content)
print(test_results["documents"][0].meta)
Expected output:
Patient discharged with CHF follow-up in 7 days. Continue furosemide 40mg daily.
{'patient_id': 'p-1001', 'doc_type': 'discharge_summary', 'source': 'ehr'}
If that passes, your indexing path and retrieval path are both working.
Real-World Use Cases
- •
Clinical policy assistant
Answer staff questions like “What’s our anticoagulation protocol for post-op patients?” using indexed hospital policies and care pathways. - •
Discharge summary copilot
Retrieve prior notes, medication lists, and discharge instructions to help clinicians draft grounded summaries faster. - •
Medical coding support
Search procedure notes and encounter documentation to suggest ICD/CPT-relevant context for coders without manually scanning charts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit