How to Integrate Haystack for healthcare with Elasticsearch for RAG

By Cyprian AaronsUpdated 2026-04-21
haystack-for-healthcareelasticsearchrag

Healthcare RAG lives or dies on retrieval quality and traceability. Haystack for healthcare gives you the pipeline primitives for clinical document processing, while Elasticsearch gives you fast, scalable retrieval over structured and unstructured medical content. Put them together and you can build an AI agent that answers policy, triage, coding, or care-path questions from internal medical knowledge without hand-waving over source grounding.

Prerequisites

  • Python 3.10+
  • Access to a running Elasticsearch cluster
  • An index in Elasticsearch with fields for:
    • content or similar text field
    • metadata fields like patient_id, doc_type, source, created_at
  • Haystack installed with the Elasticsearch integration
  • A healthcare-focused Haystack setup, including your domain document loaders/processors
  • API credentials if your Elasticsearch cluster is managed or secured

Install the packages:

pip install haystack-ai elasticsearch

If you are using a healthcare-specific Haystack package or extension in your environment, install that too according to your internal distribution.

Integration Steps

  1. Connect to Elasticsearch

Start by creating a client and verifying the cluster is reachable. In production, use TLS and authenticated access.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "https://localhost:9200",
    basic_auth=("elastic", "changeme"),
    verify_certs=True,
)

print(es.info())

For RAG, keep the index mapping explicit. Clinical search works better when you separate text content from metadata filters.

index_name = "healthcare-rag"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "patient_id": {"type": "keyword"},
                "doc_type": {"type": "keyword"},
                "source": {"type": "keyword"},
            }
        },
    )
  1. Prepare healthcare documents in Haystack

Haystack’s Document object is the unit you will pass into indexing and retrieval. For healthcare use cases, keep PHI handling outside this step unless your governance model allows it.

from haystack import Document

documents = [
    Document(
        content="Patient discharged with CHF follow-up in 7 days. Continue furosemide 40mg daily.",
        meta={"patient_id": "p-1001", "doc_type": "discharge_summary", "source": "ehr"}
    ),
    Document(
        content="Clinical note: elevated HbA1c indicates poor glycemic control. Recommend diet review.",
        meta={"patient_id": "p-1002", "doc_type": "progress_note", "source": "ehr"}
    ),
]

If you already have a healthcare ingestion pipeline, this is where parsed notes, PDFs, or claims text should enter before indexing.

  1. Index documents into Elasticsearch using Haystack

Haystack provides an Elasticsearch-backed document store. Use it as the bridge between your healthcare documents and retrieval layer.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=["https://localhost:9200"],
    basic_auth=("elastic", "changeme"),
    index=index_name,
    embedding_dim=384,
)

document_store.write_documents(documents)

If your setup uses embeddings for semantic RAG, generate embeddings before writing or configure a retriever later that handles them. For hybrid search in clinical systems, keep lexical search available because medication names and abbreviations often match better than dense vectors alone.

  1. Build a retriever for RAG queries

Use Haystack’s retriever connected to the same document store. This lets your agent fetch relevant medical context before generating an answer.

from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchEmbeddingRetriever

retriever = ElasticsearchEmbeddingRetriever(document_store=document_store)

query = "What is the recommended follow-up after CHF discharge?"
results = retriever.run(query=query, top_k=3)

for doc in results["documents"]:
    print(doc.content)
    print(doc.meta)

If you need strict filtering by patient or encounter, add metadata filters at query time so your agent only sees authorized context.

  1. Wire retrieval into an agent workflow

In a production RAG system, retrieval should happen before generation. The pattern is: user query -> retrieve from Elasticsearch -> pass context to your LLM prompt -> return answer with citations.

from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder

template = """
Answer the question using only the provided clinical context.

Question: {{question}}

Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Answer:
"""

prompt_builder = PromptBuilder(template=template)

pipe = Pipeline()
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", prompt_builder)

pipe.connect("retriever.documents", "prompt_builder.documents")

response = pipe.run({
    "retriever": {"query": query},
    "prompt_builder": {"question": query},
})

print(response["prompt_builder"]["prompt"])

At this point you have the core integration: clinical documents live in Elasticsearch, Haystack retrieves them, and your agent can generate grounded responses from those retrieved passages.

Testing the Integration

Run a simple end-to-end check: write one document, retrieve it, and confirm the expected text comes back.

test_query = "CHF follow-up"
test_results = retriever.run(query=test_query, top_k=1)

assert len(test_results["documents"]) > 0
print(test_results["documents"][0].content)
print(test_results["documents"][0].meta)

Expected output:

Patient discharged with CHF follow-up in 7 days. Continue furosemide 40mg daily.
{'patient_id': 'p-1001', 'doc_type': 'discharge_summary', 'source': 'ehr'}

If that passes, your indexing path and retrieval path are both working.

Real-World Use Cases

  • Clinical policy assistant
    Answer staff questions like “What’s our anticoagulation protocol for post-op patients?” using indexed hospital policies and care pathways.

  • Discharge summary copilot
    Retrieve prior notes, medication lists, and discharge instructions to help clinicians draft grounded summaries faster.

  • Medical coding support
    Search procedure notes and encounter documentation to suggest ICD/CPT-relevant context for coders without manually scanning charts.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides