How to Integrate Haystack for healthcare with Elasticsearch for production AI
Connecting Haystack for healthcare with Elasticsearch gives you a practical retrieval layer for clinical and insurance workflows. You get semantic search over medical documents, fast filtering on structured fields like patient ID or encounter date, and a clean path to production-grade AI agents that need evidence-backed answers.
This combo is useful when your agent has to answer questions from discharge summaries, claims notes, prior authorizations, or care plans without hallucinating. Haystack handles the pipeline and retrieval logic; Elasticsearch gives you durable indexing, filtering, scoring, and scale.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in your environment
- •A valid Elasticsearch user/password or API key
- •
haystack-aiinstalled - •
elasticsearchPython client installed - •Access to your healthcare documents in a supported format:
- •PDFs
- •text files
- •JSON records
- •extracted clinical notes
- •Basic familiarity with:
- •Haystack pipelines
- •document stores
- •embeddings / retrievers
Install the packages:
pip install haystack-ai elasticsearch sentence-transformers
Integration Steps
- •
Connect to Elasticsearch and define your index
Start by creating an Elasticsearch client. In production, use TLS and API keys; don’t hardcode credentials.
from elasticsearch import Elasticsearch es = Elasticsearch( "https://localhost:9200", api_key=("id_here", "api_key_here"), verify_certs=True, ) print(es.info())If you are using a managed cluster, keep the connection string and auth in environment variables.
- •
Create a Haystack document store backed by Elasticsearch
Haystack’s
ElasticsearchDocumentStoreis the bridge between your AI pipeline and the search engine. For healthcare data, keep metadata fields likepatient_id,encounter_id,doc_type, andsource_systemso you can filter safely.from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore document_store = ElasticsearchDocumentStore( hosts=["https://localhost:9200"], basic_auth=("elastic", "changeme"), index="healthcare_docs", embedding_dim=384, similarity="cosine", recreate_index=True, )If your deployment uses vector search in Elasticsearch, make sure the embedding dimension matches the model you pick later.
- •
Load healthcare documents into Haystack and write them to Elasticsearch
Use Haystack
Documentobjects so metadata stays attached to each chunk. That matters for compliance workflows where you need traceability back to source documents.from haystack import Document docs = [ Document( content="Patient was discharged with amoxicillin 500mg twice daily for 7 days.", meta={ "patient_id": "P1001", "doc_type": "discharge_summary", "source_system": "ehr", }, ), Document( content="Prior authorization approved for MRI lumbar spine after conservative treatment.", meta={ "patient_id": "P1002", "doc_type": "prior_auth", "source_system": "claims", }, ), ] document_store.write_documents(docs) - •
Add embeddings and retrieve relevant medical context
For semantic retrieval, generate embeddings before querying. In Haystack, use an embedder that matches your model choice and then query through a retriever connected to Elasticsearch.
from haystack.components.embedders import SentenceTransformersTextEmbedder from haystack.components.retrievers import EmbeddingRetriever text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") retriever = EmbeddingRetriever( document_store=document_store, top_k=5, ) query = "What medication was prescribed at discharge?" query_embedding = text_embedder.run(text=query)["embedding"] results = retriever.run(query_embedding=query_embedding) for doc in results["documents"]: print(doc.content) print(doc.meta) - •
Build an end-to-end pipeline for agent use
In production AI, don’t call retrieval manually inside agent logic unless you have to. Put it behind a pipeline so the agent gets consistent behavior every time.
from haystack import Pipeline from haystack.components.builders import PromptBuilder template = """ Answer the question using only the provided documents. Question: {{question}} Documents: {% for doc in documents %} - {{ doc.content }} Source: {{ doc.meta }} {% endfor %} """ prompt_builder = PromptBuilder(template=template) pipeline = Pipeline() pipeline.add_component("embedder", text_embedder) pipeline.add_component("retriever", retriever) pipeline.add_component("prompt_builder", prompt_builder) pipeline.connect("embedder.embedding", "retriever.query_embedding") pipeline.connect("retriever.documents", "prompt_builder.documents") result = pipeline.run({ "embedder": {"text": query}, "prompt_builder": {"question": query}, }) print(result["prompt_builder"]["prompt"])
Testing the Integration
Run a simple smoke test that writes one document, retrieves it by meaning, and confirms the metadata comes back correctly.
test_query = "Which antibiotic was prescribed?"
query_embedding = text_embedder.run(text=test_query)["embedding"]
response = retriever.run(query_embedding=query_embedding)
assert len(response["documents"]) > 0
first_doc = response["documents"][0]
print(first_doc.content)
print(first_doc.meta)
Expected output:
Patient was discharged with amoxicillin 500mg twice daily for 7 days.
{'patient_id': 'P1001', 'doc_type': 'discharge_summary', 'source_system': 'ehr'}
If that works, your Haystack layer is successfully retrieving from Elasticsearch with metadata intact.
Real-World Use Cases
- •
Clinical assistant for care teams
- •Retrieve discharge instructions, medication history, and recent notes for an internal copilot.
- •Filter by patient ID or encounter ID before answering.
- •
Prior authorization automation
- •Search policy docs, chart notes, and previous approvals to draft payer-facing responses.
- •Keep every retrieved passage tied to source metadata for auditability.
- •
Claims and utilization review
- •Index claims narratives and supporting clinical evidence.
- •Let agents answer “why was this denied?” with grounded context instead of freeform generation.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit