How to Integrate Haystack for pension funds with Elasticsearch for startups
Combining Haystack with Elasticsearch gives you a clean pattern for retrieval-heavy AI agents: Haystack handles the pipeline orchestration, while Elasticsearch gives you fast full-text search, filtering, and scalable document storage. For pension-fund workflows, that means your agent can answer questions over policy documents, statements, compliance notes, and member records without turning every query into a slow database scan.
For startups, the value is simple: one retrieval layer for both structured and unstructured data. You get better grounding for LLM responses, lower latency, and a path to production-grade search without building custom retrieval logic from scratch.
Prerequisites
Before wiring this up, make sure you have:
- •Python 3.10+
- •An Elasticsearch cluster running locally or in the cloud
- •A Haystack project installed
- •API credentials or basic auth for Elasticsearch if your cluster requires it
- •A set of pension-fund documents ready to index:
- •PDFs converted to text
- •policy notes
- •FAQ content
- •member support articles
Install the packages:
pip install haystack-ai elasticsearch
If you want document loading from files, also install:
pip install pypdf
Integration Steps
- •Connect to Elasticsearch
Start by creating an Elasticsearch client. This is the connection Haystack will use through its Elasticsearch document store.
from elasticsearch import Elasticsearch
es_client = Elasticsearch(
"http://localhost:9200",
basic_auth=("elastic", "changeme"),
)
print(es_client.info())
If this fails, fix connectivity first. Haystack sits on top of this store, so there is no point moving ahead with broken cluster access.
- •Create a Haystack Elasticsearch document store
Use Haystack’s ElasticsearchDocumentStore to persist and retrieve pension-fund documents.
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts="http://localhost:9200",
basic_auth=("elastic", "changeme"),
index="pension_fund_docs",
)
This gives you a dedicated index for your domain data. In practice, keep separate indexes for policies, claims support content, and internal knowledge bases if access patterns differ.
- •Load and write pension-fund documents
Haystack uses Document objects. Convert your source text into documents and push them into Elasticsearch.
from haystack import Document
docs = [
Document(
content="The pension fund allows lump-sum withdrawals only under approved conditions.",
meta={"source": "policy_handbook", "doc_type": "policy"}
),
Document(
content="Members can request contribution history statements through the portal.",
meta={"source": "member_faq", "doc_type": "faq"}
),
]
document_store.write_documents(docs)
For real systems, chunk long PDFs before writing them. Indexing whole documents works for demos, but retrieval quality drops fast once the content gets large.
- •Build a retrieval pipeline in Haystack
Now connect a retriever to the store and query it from your agent flow.
from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
# For production with Elasticsearch-backed retrieval,
# use the retriever component compatible with your Haystack version.
# If you're using hybrid search patterns, keep BM25 as one branch.
retriever = InMemoryBM25Retriever(document_store=document_store)
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
query = "How can a member request contribution history?"
result = pipeline.run(
{
"retriever": {
"query": query,
"top_k": 3,
}
}
)
print(result)
If your stack supports it, swap in an Elasticsearch-backed retriever for direct index search. The key idea stays the same: Haystack orchestrates retrieval; Elasticsearch stores and serves the documents.
- •Attach generation or agent logic
Once retrieval works, feed the top passages into your LLM step. That is where the pension-fund assistant becomes useful.
from haystack.components.builders import PromptBuilder
template = """
Answer the question using only the provided context.
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Question: {{ question }}
"""
prompt_builder = PromptBuilder(template=template)
prompt = prompt_builder.run(
{
"documents": result["retriever"]["documents"],
"question": query,
}
)
print(prompt["prompt"])
In production, pass this prompt into your generator component or agent tool call. Also add metadata filters such as doc_type, jurisdiction, or effective_date so your agent only retrieves compliant content.
Testing the Integration
Run a simple end-to-end check: write documents, retrieve them, and inspect the results.
query = "What does the fund allow for withdrawals?"
result = pipeline.run(
{
"retriever": {
"query": query,
"top_k": 2,
}
}
)
for doc in result["retriever"]["documents"]:
print(doc.content)
print(doc.meta)
Expected output:
The pension fund allows lump-sum withdrawals only under approved conditions.
{'source': 'policy_handbook', 'doc_type': 'policy'}
If you get empty results:
- •confirm the index name matches
- •verify documents were written successfully
- •check whether analyzers/tokenization match your language and content type
Real-World Use Cases
- •
Member support agents
- •Answer policy questions from indexed handbooks and FAQ pages.
- •Return grounded answers with source citations from Elasticsearch-backed retrieval.
- •
Compliance assistants
- •Search internal policy docs by jurisdiction, effective date, or product line.
- •Keep responses constrained to approved pension-fund material.
- •
Operations copilots
- •Retrieve contribution histories, statement templates, and workflow docs.
- •Help support teams resolve requests faster without manual document hunting.
The main pattern here is stable: use Elasticsearch as your searchable memory layer and Haystack as the orchestration layer around it. That gives startup teams a practical foundation for AI agents that need fast retrieval over regulated pension-fund knowledge.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit