How to Integrate Haystack for pension funds with Elasticsearch for production AI
Combining Haystack for pension funds with Elasticsearch gives you a production-grade retrieval layer for pension-document workflows. You get Haystack’s pipeline abstraction for question answering, classification, and document processing, plus Elasticsearch’s indexed search over statements, policy docs, member letters, and regulatory content.
For pension fund AI agents, this matters because most queries are not generic chat. They are grounded in archived PDFs, contribution histories, benefit rules, trustee minutes, and compliance documents that need fast retrieval with traceability.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in production
- •Access credentials for Elasticsearch
- •
haystack-aiinstalled - •
elasticsearchPython client installed - •A corpus of pension fund documents to index
- •Optional: an embedding model if you want semantic retrieval
Install the packages:
pip install haystack-ai elasticsearch sentence-transformers
Integration Steps
- •Connect to Elasticsearch
Start by creating a client and verifying the cluster is reachable. In production, use TLS and authenticated access.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"https://localhost:9200",
basic_auth=("elastic", "changeme"),
verify_certs=False,
)
print(es.info())
If this fails, stop here. Your Haystack pipeline will only be as stable as the search backend.
- •Create an index for pension documents
Use a dedicated index with fields for content, title, source, and metadata. Keep the schema simple so Haystack can write and query cleanly.
index_name = "pension_docs"
if not es.indices.exists(index=index_name):
es.indices.create(
index=index_name,
mappings={
"properties": {
"content": {"type": "text"},
"title": {"type": "text"},
"source": {"type": "keyword"},
"doc_type": {"type": "keyword"},
"created_at": {"type": "date"},
}
},
)
For pension systems, doc_type is useful for separating trustee notes from member communications and policy files.
- •Wire Haystack to Elasticsearch as the document store
Haystack uses a DocumentStore abstraction. For Elasticsearch-backed retrieval, instantiate the store and write documents into it.
from haystack import Document
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts="http://localhost:9200",
index=index_name,
basic_auth=("elastic", "changeme"),
)
docs = [
Document(
content="Members can retire early from age 55 subject to scheme rules.",
meta={"title": "Early Retirement Policy", "source": "policy.pdf", "doc_type": "policy"},
),
Document(
content="Employer contributions are reviewed annually by the trustee board.",
meta={"title": "Contribution Review", "source": "trustee_minutes.pdf", "doc_type": "minutes"},
),
]
document_store.write_documents(docs, policy=DuplicatePolicy.SKIP)
This is the core integration point. Haystack manages documents; Elasticsearch handles persistence and retrieval.
- •Build a retrieval pipeline in Haystack
Now connect retrieval to a query pipeline. For production AI agents, this is usually the first stage before answer generation or tool routing.
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack import Pipeline
# If you want direct ES-backed retrieval via Haystack integrations:
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever
retriever = ElasticsearchBM25Retriever(document_store=document_store)
pipe = Pipeline()
pipe.add_component("retriever", retriever)
query = "What is the retirement age under the scheme?"
result = pipe.run({
"retriever": {"query": query}
})
for doc in result["retriever"]["documents"]:
print(doc.content)
If you need hybrid search later, keep this structure. You can add a reranker or generator without changing how documents enter the system.
- •Add an answer generation step
For agent workflows, retrieval alone is not enough. Feed retrieved passages into a generator so your system can respond with grounded answers.
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
prompt_template = """
You are assisting with pension fund operations.
Answer only using the provided documents.
Question: {{question}}
Documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Answer:
"""
prompt_builder = PromptBuilder(template=prompt_template)
generator = OpenAIGenerator(model="gpt-4o-mini")
qa_pipe = Pipeline()
qa_pipe.add_component("retriever", retriever)
qa_pipe.add_component("prompt_builder", prompt_builder)
qa_pipe.add_component("generator", generator)
qa_pipe.connect("retriever.documents", "prompt_builder.documents")
qa_pipe.connect("prompt_builder.prompt", "generator.prompt")
response = qa_pipe.run({
"retriever": {"query": query},
"prompt_builder": {"question": query},
})
print(response["generator"]["replies"][0])
That gives you a full RAG flow: retrieve from Elasticsearch, ground the response in pension documents, then generate an answer.
Testing the Integration
Use a known query and confirm both retrieval and response generation work end to end.
test_query = "At what age can members retire early?"
output = qa_pipe.run({
"retriever": {"query": test_query},
"prompt_builder": {"question": test_query},
})
print("Answer:", output["generator"]["replies"][0])
Expected output:
Answer: Members can retire early from age 55 subject to scheme rules.
If you get empty results, check:
- •Documents were written into the correct index
- •The retriever points at the same Elasticsearch cluster
- •Your query terms actually appear in indexed content
Real-World Use Cases
- •
Member support agent
- •Answer retirement age, contribution rules, transfer procedures, and benefit eligibility from indexed scheme docs.
- •
Trustee document assistant
- •Search meeting minutes, actuarial reports, funding updates, and governance policies with traceable citations.
- •
Compliance review workflow
- •Pull relevant sections from policy archives when reviewing disclosures, communications templates, or regulatory changes.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit