How to Integrate Haystack for insurance with Elasticsearch for startups

By Cyprian AaronsUpdated 2026-04-21

haystack-for-insuranceelasticsearchstartups

Combining Haystack for insurance with Elasticsearch gives you a practical retrieval layer for agent systems that need to answer policy, claims, and underwriting questions from messy internal documents. Haystack handles orchestration and retrieval pipelines; Elasticsearch gives you fast indexed search over PDFs, emails, policy docs, and claim notes.

For startups, this is the cleanest way to build an insurance assistant that can ground answers in source documents instead of guessing.

Prerequisites

•Python 3.10+
•An Elasticsearch cluster running locally or in the cloud
•API credentials for your Elasticsearch deployment
•Haystack installed with Elasticsearch integration support
•Insurance documents already extracted into text chunks
•Basic familiarity with embeddings and retrieval pipelines

Install the core packages:

pip install haystack-ai elasticsearch sentence-transformers

If you’re using a managed Elasticsearch service, make sure you have:

•ELASTICSEARCH_URL
•ELASTICSEARCH_API_KEY or username/password
•TLS settings if required by your cluster

Integration Steps

•Set up the Elasticsearch connection.

Use the official client first so you can validate connectivity before wiring Haystack into it.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "https://localhost:9200",
    api_key="YOUR_API_KEY",
)

print(es.info())

If this fails, fix networking, auth, or TLS before moving on. Don’t debug Haystack until Elasticsearch is healthy.

•Create an index for insurance content.

Store document chunks with fields that matter for retrieval: text, source, policy type, claim ID, and embedding vector.

index_name = "insurance-docs"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "source": {"type": "keyword"},
                "policy_type": {"type": "keyword"},
                "claim_id": {"type": "keyword"},
                "embedding": {
                    "type": "dense_vector",
                    "dims": 384,
                    "index": True,
                    "similarity": "cosine"
                }
            }
        }
    )

For startups, keep the schema simple. Add metadata fields only when you need filtering by product line or claim workflow.

•Ingest insurance documents into Elasticsearch.

Haystack’s Document object is what you’ll pass through the pipeline. You can generate embeddings separately and store them alongside each chunk.

from haystack.dataclasses import Document
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

docs = [
    Document(
        content="Dental coverage includes annual cleanings and one emergency visit per year.",
        meta={"source": "policy_guide.pdf", "policy_type": "dental", "claim_id": None},
    ),
    Document(
        content="Claims over $5,000 require manual review by the underwriting team.",
        meta={"source": "claims_playbook.pdf", "policy_type": "claims", "claim_id": None},
    ),
]

for doc in docs:
    embedding = model.encode(doc.content).tolist()
    es.index(
        index=index_name,
        document={
            "content": doc.content,
            **doc.meta,
            "embedding": embedding,
        },
    )

es.indices.refresh(index=index_name)

This is enough for a startup-grade retrieval store. You can later add chunking, OCR output, and versioned policy sources.

•Wire Elasticsearch into a Haystack retrieval pipeline.

Use Haystack’s ElasticsearchDocumentStore and EmbeddingRetriever to query your indexed insurance content.

from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
from haystack.components.retrievers import EmbeddingRetriever
from haystack import Pipeline

document_store = ElasticsearchDocumentStore(
    hosts=["https://localhost:9200"],
    api_key="YOUR_API_KEY",
    index=index_name,
)

retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
)

pipeline = Pipeline()
pipeline.add_component("retriever", retriever)

If your Haystack version uses a different component path, keep the same pattern: connect the document store to an embedding retriever and run it inside a pipeline.

•Run a query and return grounded insurance answers.

This is where the agent system starts becoming useful. The retriever pulls back relevant passages that your LLM can then summarize or cite.

query = "What does dental coverage include?"
result = pipeline.run({
    "retriever": {
        "query": query,
        "top_k": 3,
    }
})

for doc in result["retriever"]["documents"]:
    print(doc.content)

In production, feed those retrieved documents into your generation step with strict citation rules. Don’t let the model answer without evidence from the retrieved context.

Testing the Integration

Run a direct search first, then verify Haystack returns the same source material through retrieval.

test_query = {
    "query": {
        "match": {
            "content": {
                "query": "manual review claims over 5000"
            }
        }
    }
}

search_result = es.search(index=index_name, body=test_query)
print(search_result["hits"]["hits"][0]["_source"]["content"])

Expected output:

Claims over $5,000 require manual review by the underwriting team.

Then check Haystack retrieval:

result = pipeline.run({
    "retriever": {
        "query": "Which claims need manual review?",
        "top_k": 1,
    }
})

print(result["retriever"]["documents"][0].content)

Expected output:

Claims over $5,000 require manual review by the underwriting team.

If both outputs match on the same source passage, your indexing and retrieval path is working correctly.

Real-World Use Cases

•
Policy Q&A assistant
Let agents ask questions like “Does this plan cover orthodontics?” and return grounded answers from policy documents.
•
Claims triage helper
Retrieve relevant claim rules, thresholds, and exception handling steps before routing a case to human review.
•
Underwriting copilot
Search prior submissions, product guidelines, and risk notes to help underwriters make faster decisions with traceable context.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit