How to Integrate Haystack for pension funds with Elasticsearch for startups

By Cyprian AaronsUpdated 2026-04-21
haystack-for-pension-fundselasticsearchstartups

Combining Haystack with Elasticsearch gives you a clean pattern for retrieval-heavy AI agents: Haystack handles the pipeline orchestration, while Elasticsearch gives you fast full-text search, filtering, and scalable document storage. For pension-fund workflows, that means your agent can answer questions over policy documents, statements, compliance notes, and member records without turning every query into a slow database scan.

For startups, the value is simple: one retrieval layer for both structured and unstructured data. You get better grounding for LLM responses, lower latency, and a path to production-grade search without building custom retrieval logic from scratch.

Prerequisites

Before wiring this up, make sure you have:

  • Python 3.10+
  • An Elasticsearch cluster running locally or in the cloud
  • A Haystack project installed
  • API credentials or basic auth for Elasticsearch if your cluster requires it
  • A set of pension-fund documents ready to index:
    • PDFs converted to text
    • policy notes
    • FAQ content
    • member support articles

Install the packages:

pip install haystack-ai elasticsearch

If you want document loading from files, also install:

pip install pypdf

Integration Steps

  1. Connect to Elasticsearch

Start by creating an Elasticsearch client. This is the connection Haystack will use through its Elasticsearch document store.

from elasticsearch import Elasticsearch

es_client = Elasticsearch(
    "http://localhost:9200",
    basic_auth=("elastic", "changeme"),
)

print(es_client.info())

If this fails, fix connectivity first. Haystack sits on top of this store, so there is no point moving ahead with broken cluster access.

  1. Create a Haystack Elasticsearch document store

Use Haystack’s ElasticsearchDocumentStore to persist and retrieve pension-fund documents.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts="http://localhost:9200",
    basic_auth=("elastic", "changeme"),
    index="pension_fund_docs",
)

This gives you a dedicated index for your domain data. In practice, keep separate indexes for policies, claims support content, and internal knowledge bases if access patterns differ.

  1. Load and write pension-fund documents

Haystack uses Document objects. Convert your source text into documents and push them into Elasticsearch.

from haystack import Document

docs = [
    Document(
        content="The pension fund allows lump-sum withdrawals only under approved conditions.",
        meta={"source": "policy_handbook", "doc_type": "policy"}
    ),
    Document(
        content="Members can request contribution history statements through the portal.",
        meta={"source": "member_faq", "doc_type": "faq"}
    ),
]

document_store.write_documents(docs)

For real systems, chunk long PDFs before writing them. Indexing whole documents works for demos, but retrieval quality drops fast once the content gets large.

  1. Build a retrieval pipeline in Haystack

Now connect a retriever to the store and query it from your agent flow.

from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever

# For production with Elasticsearch-backed retrieval,
# use the retriever component compatible with your Haystack version.
# If you're using hybrid search patterns, keep BM25 as one branch.
retriever = InMemoryBM25Retriever(document_store=document_store)

pipeline = Pipeline()
pipeline.add_component("retriever", retriever)

query = "How can a member request contribution history?"
result = pipeline.run(
    {
        "retriever": {
            "query": query,
            "top_k": 3,
        }
    }
)

print(result)

If your stack supports it, swap in an Elasticsearch-backed retriever for direct index search. The key idea stays the same: Haystack orchestrates retrieval; Elasticsearch stores and serves the documents.

  1. Attach generation or agent logic

Once retrieval works, feed the top passages into your LLM step. That is where the pension-fund assistant becomes useful.

from haystack.components.builders import PromptBuilder

template = """
Answer the question using only the provided context.

Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Question: {{ question }}
"""

prompt_builder = PromptBuilder(template=template)

prompt = prompt_builder.run(
    {
        "documents": result["retriever"]["documents"],
        "question": query,
    }
)

print(prompt["prompt"])

In production, pass this prompt into your generator component or agent tool call. Also add metadata filters such as doc_type, jurisdiction, or effective_date so your agent only retrieves compliant content.

Testing the Integration

Run a simple end-to-end check: write documents, retrieve them, and inspect the results.

query = "What does the fund allow for withdrawals?"
result = pipeline.run(
    {
        "retriever": {
            "query": query,
            "top_k": 2,
        }
    }
)

for doc in result["retriever"]["documents"]:
    print(doc.content)
    print(doc.meta)

Expected output:

The pension fund allows lump-sum withdrawals only under approved conditions.
{'source': 'policy_handbook', 'doc_type': 'policy'}

If you get empty results:

  • confirm the index name matches
  • verify documents were written successfully
  • check whether analyzers/tokenization match your language and content type

Real-World Use Cases

  • Member support agents

    • Answer policy questions from indexed handbooks and FAQ pages.
    • Return grounded answers with source citations from Elasticsearch-backed retrieval.
  • Compliance assistants

    • Search internal policy docs by jurisdiction, effective date, or product line.
    • Keep responses constrained to approved pension-fund material.
  • Operations copilots

    • Retrieve contribution histories, statement templates, and workflow docs.
    • Help support teams resolve requests faster without manual document hunting.

The main pattern here is stable: use Elasticsearch as your searchable memory layer and Haystack as the orchestration layer around it. That gives startup teams a practical foundation for AI agents that need fast retrieval over regulated pension-fund knowledge.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides