How to Integrate Haystack for pension funds with Elasticsearch for AI agents
Combining Haystack for pension funds with Elasticsearch gives you a practical retrieval layer for AI agents that need to answer questions over policy documents, member communications, investment reports, and compliance material. Haystack handles the pipeline logic; Elasticsearch gives you fast full-text search and filtering across large pension datasets.
This setup is useful when your agent needs grounded answers with traceable sources, not generic LLM output. It also works well for internal support bots, document Q&A, and compliance assistants where auditability matters.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in a managed environment
- •An index containing pension fund documents such as:
- •policy PDFs converted to text
- •contribution statements
- •retirement guides
- •trustee meeting notes
- •
haystack-aiinstalled - •
elasticsearchPython client installed - •API credentials or access URL for your Elasticsearch cluster
- •Basic familiarity with embeddings and retrieval pipelines
Install the packages:
pip install haystack-ai elasticsearch sentence-transformers
Integration Steps
- •Set up the Elasticsearch connection
Start by connecting your app to Elasticsearch. For production, use HTTPS and authenticated access.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"https://localhost:9200",
basic_auth=("elastic", "changeme"),
verify_certs=False,
)
print(es.info())
If this fails, stop here. Your agent stack depends on a healthy search backend before Haystack can do anything useful.
- •Create an index for pension fund documents
Use a schema that supports text search plus vector retrieval metadata. Keep fields simple: document ID, content, source, and embedding vector if you plan to do semantic retrieval.
index_name = "pension_fund_docs"
mapping = {
"mappings": {
"properties": {
"content": {"type": "text"},
"title": {"type": "text"},
"source": {"type": "keyword"},
"doc_type": {"type": "keyword"},
}
}
}
if not es.indices.exists(index=index_name):
es.indices.create(index=index_name, mappings=mapping["mappings"])
Now add a few documents.
docs = [
{
"_index": index_name,
"_id": "1",
"_source": {
"title": "Retirement Contribution Rules",
"content": "Members may increase contributions during annual enrollment.",
"source": "member_handbook",
"doc_type": "policy",
},
},
{
"_index": index_name,
"_id": "2",
"_source": {
"title": "Trustee Meeting Summary",
"content": "The board reviewed default fund performance and fee changes.",
"source": "trustee_minutes",
"doc_type": "governance",
},
},
]
for doc in docs:
es.index(**doc)
es.indices.refresh(index=index_name)
- •Build the Haystack pipeline around Elasticsearch retrieval
Haystack uses components and pipelines. For this pattern, you wire an Elasticsearch-backed retriever into a query pipeline so the agent can fetch relevant context before generation.
from haystack import Pipeline, Document
from haystack.components.retrievers import InMemoryBM25Retriever
documents = [
Document(content="Members may increase contributions during annual enrollment.", meta={"source": "member_handbook"}),
Document(content="The board reviewed default fund performance and fee changes.", meta={"source": "trustee_minutes"}),
]
retriever = InMemoryBM25Retriever(documents=documents)
pipe = Pipeline()
pipe.add_component("retriever", retriever)
result = pipe.run({
"retriever": {
"query": "What can members do during annual enrollment?"
}
})
print(result)
If you want Elasticsearch-backed retrieval directly in Haystack, use the Elasticsearch document store and retriever classes available in your Haystack version. The exact class names vary by release, but the pattern stays the same: write documents into Elasticsearch, then retrieve top-k matches from it inside a pipeline.
- •Write documents from Haystack into Elasticsearch
In real systems, ingestion should happen through one path only. Use Haystack to normalize documents, then push them into Elasticsearch for indexing.
from haystack import Document
haystack_docs = [
Document(
content="Pension transfers must be reviewed within five business days.",
meta={"source": "operations_manual", "doc_type": "policy"}
),
Document(
content="Members can request benefit projections through the portal.",
meta={"source": "member_portal", "doc_type": "faq"}
),
]
for doc in haystack_docs:
es.index(
index=index_name,
document={
"content": doc.content,
**doc.meta,
},
)
es.indices.refresh(index=index_name)
This keeps your source-of-truth in Elasticsearch while letting Haystack orchestrate retrieval and downstream agent steps.
- •Query Elasticsearch through Haystack-style agent logic
For an AI agent system, retrieve context first, then pass only grounded snippets to the model. That reduces hallucinations and keeps responses tied to pension documentation.
query = {
"query": {
"multi_match": {
"query": "How long do transfer reviews take?",
"fields": ["content", "title"]
}
}
}
response = es.search(index=index_name, body=query)
for hit in response["hits"]["hits"]:
print(hit["_source"]["title"], "-", hit["_source"]["content"])
A production agent would take those hits, format them into context, then send them to your LLM with instructions like: “Answer only from retrieved pension fund documents.”
Testing the Integration
Run a direct search against Elasticsearch and verify that Haystack-side ingestion produced searchable content.
response = es.search(
index=index_name,
query={
"match_phrase": {
"content": "benefit projections"
}
}
)
hits = response["hits"]["hits"]
print(f"Found {len(hits)} result(s)")
for hit in hits:
print(hit["_source"]["title"])
Expected output:
Found 1 result(s)
Member Portal FAQ
If you get zero results, check these first:
- •Did you refresh the index after writing?
- •Are you searching the right field?
- •Is your content actually present in
_source? - •Are analyzer settings too strict for your language or tokenization needs?
Real-World Use Cases
- •Member support agent
- •Answer questions about contributions, transfers, retirement age rules, and benefit projections using indexed pension documentation.
- •Compliance assistant
- •Retrieve policy references for audit trails when staff ask about trustee decisions or operational procedures.
- •Operations copilot
- •Help internal teams locate forms, SLAs, meeting notes, and process docs without manually searching shared drives.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit