How to Integrate Haystack for insurance with Elasticsearch for AI agents

By Cyprian AaronsUpdated 2026-04-21
haystack-for-insuranceelasticsearchai-agents

Combining Haystack for insurance with Elasticsearch gives you a practical retrieval layer for AI agents that need policy-aware answers, claim lookup, and document search over large insurance corpora. Haystack handles the agent orchestration and retrieval pipeline, while Elasticsearch gives you fast full-text search, filtering, and scoring across policy docs, claims notes, endorsements, and underwriting files.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in the cloud
  • A Haystack for insurance project installed in your environment
  • API credentials or access tokens if your Haystack deployment requires them
  • Insurance documents ready to index:
    • policy wordings
    • claims summaries
    • underwriting guidelines
    • broker correspondence
  • Python packages:
    • haystack
    • elasticsearch
    • any Haystack integration package your insurance stack uses for document ingestion and agents

Integration Steps

  1. Install the dependencies and verify both clients are available.
pip install haystack elasticsearch
from elasticsearch import Elasticsearch

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

If your insurance setup exposes a specific Haystack package or pipeline wrapper, install that as well and keep the same pattern: Elasticsearch stores the corpus, Haystack builds the agent workflow on top.

  1. Connect to Elasticsearch and create an index for insurance content.
from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

index_name = "insurance-docs"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "doc_type": {"type": "keyword"},
                "policy_id": {"type": "keyword"},
                "claim_id": {"type": "keyword"},
            }
        },
    )

print(es.info())

Use explicit fields for insurance metadata. In production, you will filter by policy_id, claim_id, jurisdiction, product line, or effective date.

  1. Write documents into Elasticsearch using Haystack documents as the source format.
from elasticsearch import Elasticsearch
from haystack import Document

es = Elasticsearch("http://localhost:9200")
index_name = "insurance-docs"

docs = [
    Document(
        content="This policy excludes flood damage unless flood cover is endorsed.",
        meta={"doc_type": "policy", "policy_id": "POL-1001"}
    ),
    Document(
        content="Claim CLM-7782 was denied due to exclusion clause 4.2.",
        meta={"doc_type": "claim", "claim_id": "CLM-7782"}
    ),
]

for doc in docs:
    es.index(
        index=index_name,
        document={
            "content": doc.content,
            **doc.meta,
        },
    )

es.indices.refresh(index=index_name)

At this point, Elasticsearch is your source of truth for retrieval. Haystack can sit on top of it to orchestrate retrieval plus generation for agent responses.

  1. Build a retrieval pipeline in Haystack that queries Elasticsearch.
from elasticsearch import Elasticsearch
from haystack import Pipeline, Document

es = Elasticsearch("http://localhost:9200")
index_name = "insurance-docs"

def search_elasticsearch(query: str, size: int = 5):
    response = es.search(
        index=index_name,
        query={
            "multi_match": {
                "query": query,
                "fields": ["content", "doc_type", "policy_id", "claim_id"]
            }
        },
        size=size,
    )
    return [
        Document(
            content=hit["_source"]["content"],
            meta={k: v for k, v in hit["_source"].items() if k != "content"},
            score=hit["_score"],
        )
        for hit in response["hits"]["hits"]
    ]

query = "flood exclusion"
results = search_elasticsearch(query)

for doc in results:
    print(doc.score, doc.meta, doc.content)

This is the core integration pattern. In a real agent system, wrap search_elasticsearch() inside a Haystack tool or custom component so the agent can retrieve relevant policy language before generating an answer.

  1. Attach retrieval to an AI agent flow and use the retrieved context to answer questions.
from elasticsearch import Elasticsearch
from haystack import Document

es = Elasticsearch("http://localhost:9200")
index_name = "insurance-docs"

def retrieve_context(question: str) -> str:
    response = es.search(
        index=index_name,
        query={
            "multi_match": {
                "query": question,
                "fields": ["content", "doc_type", "policy_id", "claim_id"]
            }
        },
        size=3,
    )

    chunks = []
    for hit in response["hits"]["hits"]:
      src = hit["_source"]
      chunks.append(f"[{src.get('doc_type')}] {src.get('content')}")
    return "\n".join(chunks)

question = "Does this policy cover flood damage?"
context = retrieve_context(question)

prompt = f"""
Use the context below to answer the insurance question.

Context:
{context}

Question:
{question}
"""

print(prompt)

In production, this prompt would be passed to your LLM node inside a Haystack pipeline or agent executor. The important part is that retrieval happens through Elasticsearch with insurance-specific metadata preserved end-to-end.

Testing the Integration

Run a simple query against a known indexed policy clause and check that the right document comes back.

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

response = es.search(
    index="insurance-docs",
    query={
        "match": {
            "content": {
                "query": "flood damage"
            }
        }
    },
    size=1,
)

hit = response["hits"]["hits"][0]["_source"]
print(hit["doc_type"])
print(hit["policy_id"])
print(hit["content"])

Expected output:

policy
POL-1001
This policy excludes flood damage unless flood cover is endorsed.

If you get that result back consistently, your indexing and retrieval path is working.

Real-World Use Cases

  • Claims triage agents
    • Retrieve prior claim notes, exclusions, and settlement history before drafting a recommendation.
  • Policy Q&A assistants
    • Answer broker or customer questions using exact policy wording instead of hallucinated summaries.
  • Underwriting copilots
    • Search underwriting guidelines and product rules while drafting risk assessments or referral notes.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides