How to Integrate Haystack for insurance with Elasticsearch for production AI

By Cyprian AaronsUpdated 2026-04-21
haystack-for-insuranceelasticsearchproduction-ai

Combining Haystack for insurance with Elasticsearch gives you a practical retrieval layer for production AI agents. In insurance workflows, that usually means fast access to policy docs, claims notes, underwriting rules, and customer correspondence without stuffing everything into the model context.

The pattern is simple: Haystack handles orchestration and retrieval logic, while Elasticsearch gives you indexed, low-latency search over structured and unstructured insurance data. That combination is what you want when an agent needs to answer questions with traceable evidence instead of guessing.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in your environment
  • A Haystack installation compatible with your project
  • Network access from your app to Elasticsearch
  • Insurance documents ready to index:
    • policy PDFs
    • claims summaries
    • underwriting guidelines
    • FAQ or knowledge base articles
  • Environment variables set for credentials if your cluster is secured

Install the dependencies:

pip install haystack-ai elasticsearch

If you are using Haystack’s Elasticsearch integration package in your stack, make sure the connector package is installed as well.

Integration Steps

1) Start by connecting to Elasticsearch

Use the official Python client first. This verifies connectivity before you wire it into Haystack.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "http://localhost:9200",
    basic_auth=("elastic", "changeme")
)

print(es.info())

For production, point this at your managed cluster and use API keys instead of hardcoded passwords.

2) Create an index for insurance documents

Use a dedicated index so your retrieval layer stays isolated from operational data.

index_name = "insurance-docs"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "title": {"type": "text"},
                "doc_type": {"type": "keyword"},
                "policy_id": {"type": "keyword"},
                "embedding": {"type": "dense_vector", "dims": 384}
            }
        }
    )

print(f"Index ready: {index_name}")

If you plan to use vector retrieval, make sure the embedding dimension matches the model you choose.

3) Index insurance content through Haystack documents

Haystack works best when your content is represented as Document objects. In production, this is where you normalize claims notes, policy text, and underwriting guidance before indexing.

from haystack import Document

documents = [
    Document(
        content="Coverage applies when water damage results from sudden pipe burst.",
        meta={"title": "Home Policy Water Damage", "doc_type": "policy", "policy_id": "HP-1001"}
    ),
    Document(
        content="Claims above $10,000 require supervisor approval before settlement.",
        meta={"title": "Claims Approval Rule", "doc_type": "guideline", "policy_id": "CLM-OPS"}
    )
]

for doc in documents:
    es.index(
        index=index_name,
        document={
            "content": doc.content,
            **doc.meta
        }
    )

es.indices.refresh(index=index_name)
print("Documents indexed")

This keeps the source of truth in Elasticsearch while letting Haystack manage downstream retrieval and agent reasoning.

4) Build a Haystack retriever over Elasticsearch

In Haystack pipelines, use an Elasticsearch-backed retriever component so queries route into your index. The exact class name depends on the Haystack version you deploy, but the pattern is consistent: configure the retriever with your Elasticsearch connection and index name.

from haystack import Pipeline
from haystack.components.builders import PromptBuilder

# Example pattern for an Elasticsearch-backed retriever in Haystack
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever

retriever = ElasticsearchBM25Retriever(
    client=es,
    index=index_name,
    top_k=3
)

template = """
Answer the question using only the retrieved documents.

Question: {{question}}

Documents:
{% for doc in documents %}
- {{ doc.content }} ({{ doc.meta.title }})
{% endfor %}

Answer:
"""

prompt_builder = PromptBuilder(template=template)

pipe = Pipeline()
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", prompt_builder)
pipe.connect("retriever.documents", "prompt_builder.documents")

If you are using embeddings instead of BM25, swap in a vector retriever and keep the rest of the pipeline unchanged.

5) Run a query end-to-end

Now test the full path from user question to retrieved evidence.

result = pipe.run(
    {
        "retriever": {"query": "When does water damage coverage apply?"},
        "prompt_builder": {"question": "When does water damage coverage apply?"}
    }
)

print(result["prompt_builder"]["prompt"])

At this point your agent can feed the generated prompt into an LLM response step, or return citations directly to the user.

Testing the Integration

Use a simple smoke test that checks both indexing and retrieval.

query = {
    "retriever": {"query": "What approval is needed for claims above $10,000?"},
    "prompt_builder": {"question": "What approval is needed for claims above $10,000?"}
}

result = pipe.run(query)
output = result["prompt_builder"]["prompt"]

print(output)
assert "supervisor approval" in output.lower()

Expected output:

Answer the question using only the retrieved documents.

Question: What approval is needed for claims above $10,000?

Documents:
- Claims Approval Rule (Claims Approval Rule)

Answer:

If that assertion passes, your Haystack-to-Elasticsearch path is working and returning relevant insurance content.

Real-World Use Cases

  • Claims triage assistant
    • Retrieve policy language, prior claims notes, and handling rules before drafting a response.
  • Underwriting copilot
    • Search historical submissions and guideline docs to flag missing information or rule conflicts.
  • Customer service agent
    • Answer coverage questions with citations from approved policy documents instead of free-form model output.

The production pattern here is stable: store canonical insurance content in Elasticsearch, retrieve it through Haystack, then let your agent reason over grounded context. That gives you speed, auditability, and enough control to ship something a compliance team will actually sign off on.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides