How to Integrate Haystack for healthcare with Elasticsearch for startups

By Cyprian AaronsUpdated 2026-04-21

haystack-for-healthcareelasticsearchstartups

Combining Haystack for healthcare with Elasticsearch gives you a practical pattern for clinical search and retrieval inside an AI agent. You get Haystack’s document processing and pipeline orchestration on top of Elasticsearch’s fast indexed retrieval, which is useful for symptom triage, policy lookup, patient education search, and internal medical knowledge assistants.

Prerequisites

•Python 3.10+
•An Elasticsearch cluster running locally or in the cloud
•pip installed
•Access to the Haystack for healthcare package you plan to use
•API credentials if your Elasticsearch deployment requires auth
•A small set of healthcare documents in text or JSON form

Install the core packages:

pip install haystack-ai elasticsearch

If your Haystack healthcare setup uses extra integrations, install those too:

pip install sentence-transformers

Integration Steps

•Set up Elasticsearch and create an index

Start by connecting to Elasticsearch and creating an index for clinical content. Keep the mapping simple at first: store raw text plus metadata like source, specialty, and document type.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "http://localhost:9200",
    basic_auth=("elastic", "changeme")
)

index_name = "healthcare_docs"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "source": {"type": "keyword"},
                "specialty": {"type": "keyword"},
                "doc_id": {"type": "keyword"}
            }
        }
    )

print(es.info())

•Load healthcare documents into Haystack

Use Haystack’s Document objects to normalize your data before sending it into Elasticsearch. In a startup environment, this is where you turn PDFs, FAQs, discharge summaries, or internal policy docs into searchable records.

from haystack import Document

docs = [
    Document(
        content="Hypertension management includes lifestyle changes and antihypertensive medication.",
        meta={"source": "clinic_guide", "specialty": "cardiology", "doc_id": "h1"}
    ),
    Document(
        content="Asthma action plans should include trigger avoidance and rescue inhaler instructions.",
        meta={"source": "patient_education", "specialty": "pulmonology", "doc_id": "a1"}
    )
]

for doc in docs:
    print(doc.content, doc.meta)

•Write documents into Elasticsearch

Haystack’s Elasticsearch integration is typically done through a document store. Depending on your Haystack version, use the Elasticsearch-backed document store available in your stack. The pattern is the same: write Document objects into the index so retrieval can happen later.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts="http://localhost:9200",
    basic_auth=("elastic", "changeme"),
    index=index_name,
)

document_store.write_documents(docs)
print("Documents written:", document_store.count_documents())

If your version exposes a different constructor name or auth parameter shape, keep the same idea: initialize the document store against your Elasticsearch cluster and write Haystack Document objects into it.

•Build a retrieval pipeline

Now connect query-time retrieval to the same index. For startups building AI agents, this is usually the most valuable part: a user asks a medical question, and the agent fetches grounded context before generating an answer.

from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever

# If you're using an Elasticsearch-backed retriever in your setup,
# swap this component for the one provided by your Haystack integration.
retriever = InMemoryBM25Retriever(document_store=document_store)

pipe = Pipeline()
pipe.add_component("retriever", retriever)

query = {"retriever": {"query": "What is included in asthma care?"}}
result = pipe.run(query)

for doc in result["retriever"]["documents"]:
    print(doc.content)

In production, use the retriever component that targets your Elasticsearch document store directly. The key pattern is unchanged: query goes into Haystack, Haystack retrieves from Elasticsearch, then passes context to downstream agent logic.

•Add filtering for clinical metadata

Healthcare search gets much better when you filter by specialty or source type. This keeps results clinically relevant and reduces hallucination risk in your agent.

results = document_store.filter_documents(
    filters={
        "operator": "AND",
        "conditions": [
            {"field": "specialty", "operator": "==", "value": "pulmonology"}
        ]
    }
)

for doc in results:
    print(doc.meta["doc_id"], doc.content)

For startup systems, this is how you separate patient-facing content from clinician-only guidance without building a second search stack.

Testing the Integration

Run a simple end-to-end check: write a document, query it back, and confirm the retrieved result matches expected clinical content.

test_query = {"retriever": {"query": "How should asthma be managed?"}}
test_result = pipe.run(test_query)

docs = test_result["retriever"]["documents"]
assert len(docs) > 0

top_doc = docs[0]
print("Top result:", top_doc.content)
print("Metadata:", top_doc.meta)

Expected output:

Top result: Asthma action plans should include trigger avoidance and rescue inhaler instructions.
Metadata: {'source': 'patient_education', 'specialty': 'pulmonology', 'doc_id': 'a1'}

If that works, your ingestion path, index mapping, and retrieval path are connected correctly.

Real-World Use Cases

•
Clinical support agent
Build an internal assistant that answers staff questions from hospital SOPs, treatment protocols, and patient education materials.
•
Patient-facing triage assistant
Retrieve approved symptom guidance from indexed medical content before generating responses in a chatbot flow.
•
Insurance claims knowledge search
Index claim policies, prior authorization rules, and coding guidance so agents can answer operational questions quickly.

The main production pattern here is simple: use Haystack to structure ingestion and orchestration, use Elasticsearch for fast retrieval at scale. That combination gives startups a clean path from raw healthcare content to an AI agent that can answer with grounded context instead of guessing.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit