How to Integrate Haystack for pension funds with Elasticsearch for AI agents

By Cyprian AaronsUpdated 2026-04-21
haystack-for-pension-fundselasticsearchai-agents

Combining Haystack for pension funds with Elasticsearch gives you a practical retrieval layer for AI agents that need to answer questions over policy documents, member communications, investment reports, and compliance material. Haystack handles the pipeline logic; Elasticsearch gives you fast full-text search and filtering across large pension datasets.

This setup is useful when your agent needs grounded answers with traceable sources, not generic LLM output. It also works well for internal support bots, document Q&A, and compliance assistants where auditability matters.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in a managed environment
  • An index containing pension fund documents such as:
    • policy PDFs converted to text
    • contribution statements
    • retirement guides
    • trustee meeting notes
  • haystack-ai installed
  • elasticsearch Python client installed
  • API credentials or access URL for your Elasticsearch cluster
  • Basic familiarity with embeddings and retrieval pipelines

Install the packages:

pip install haystack-ai elasticsearch sentence-transformers

Integration Steps

  1. Set up the Elasticsearch connection

Start by connecting your app to Elasticsearch. For production, use HTTPS and authenticated access.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "https://localhost:9200",
    basic_auth=("elastic", "changeme"),
    verify_certs=False,
)

print(es.info())

If this fails, stop here. Your agent stack depends on a healthy search backend before Haystack can do anything useful.

  1. Create an index for pension fund documents

Use a schema that supports text search plus vector retrieval metadata. Keep fields simple: document ID, content, source, and embedding vector if you plan to do semantic retrieval.

index_name = "pension_fund_docs"

mapping = {
    "mappings": {
        "properties": {
            "content": {"type": "text"},
            "title": {"type": "text"},
            "source": {"type": "keyword"},
            "doc_type": {"type": "keyword"},
        }
    }
}

if not es.indices.exists(index=index_name):
    es.indices.create(index=index_name, mappings=mapping["mappings"])

Now add a few documents.

docs = [
    {
        "_index": index_name,
        "_id": "1",
        "_source": {
            "title": "Retirement Contribution Rules",
            "content": "Members may increase contributions during annual enrollment.",
            "source": "member_handbook",
            "doc_type": "policy",
        },
    },
    {
        "_index": index_name,
        "_id": "2",
        "_source": {
            "title": "Trustee Meeting Summary",
            "content": "The board reviewed default fund performance and fee changes.",
            "source": "trustee_minutes",
            "doc_type": "governance",
        },
    },
]

for doc in docs:
    es.index(**doc)

es.indices.refresh(index=index_name)
  1. Build the Haystack pipeline around Elasticsearch retrieval

Haystack uses components and pipelines. For this pattern, you wire an Elasticsearch-backed retriever into a query pipeline so the agent can fetch relevant context before generation.

from haystack import Pipeline, Document
from haystack.components.retrievers import InMemoryBM25Retriever

documents = [
    Document(content="Members may increase contributions during annual enrollment.", meta={"source": "member_handbook"}),
    Document(content="The board reviewed default fund performance and fee changes.", meta={"source": "trustee_minutes"}),
]

retriever = InMemoryBM25Retriever(documents=documents)

pipe = Pipeline()
pipe.add_component("retriever", retriever)

result = pipe.run({
    "retriever": {
        "query": "What can members do during annual enrollment?"
    }
})

print(result)

If you want Elasticsearch-backed retrieval directly in Haystack, use the Elasticsearch document store and retriever classes available in your Haystack version. The exact class names vary by release, but the pattern stays the same: write documents into Elasticsearch, then retrieve top-k matches from it inside a pipeline.

  1. Write documents from Haystack into Elasticsearch

In real systems, ingestion should happen through one path only. Use Haystack to normalize documents, then push them into Elasticsearch for indexing.

from haystack import Document

haystack_docs = [
    Document(
        content="Pension transfers must be reviewed within five business days.",
        meta={"source": "operations_manual", "doc_type": "policy"}
    ),
    Document(
        content="Members can request benefit projections through the portal.",
        meta={"source": "member_portal", "doc_type": "faq"}
    ),
]

for doc in haystack_docs:
    es.index(
        index=index_name,
        document={
            "content": doc.content,
            **doc.meta,
        },
    )

es.indices.refresh(index=index_name)

This keeps your source-of-truth in Elasticsearch while letting Haystack orchestrate retrieval and downstream agent steps.

  1. Query Elasticsearch through Haystack-style agent logic

For an AI agent system, retrieve context first, then pass only grounded snippets to the model. That reduces hallucinations and keeps responses tied to pension documentation.

query = {
    "query": {
        "multi_match": {
            "query": "How long do transfer reviews take?",
            "fields": ["content", "title"]
        }
    }
}

response = es.search(index=index_name, body=query)

for hit in response["hits"]["hits"]:
    print(hit["_source"]["title"], "-", hit["_source"]["content"])

A production agent would take those hits, format them into context, then send them to your LLM with instructions like: “Answer only from retrieved pension fund documents.”

Testing the Integration

Run a direct search against Elasticsearch and verify that Haystack-side ingestion produced searchable content.

response = es.search(
    index=index_name,
    query={
        "match_phrase": {
            "content": "benefit projections"
        }
    }
)

hits = response["hits"]["hits"]
print(f"Found {len(hits)} result(s)")
for hit in hits:
    print(hit["_source"]["title"])

Expected output:

Found 1 result(s)
Member Portal FAQ

If you get zero results, check these first:

  • Did you refresh the index after writing?
  • Are you searching the right field?
  • Is your content actually present in _source?
  • Are analyzer settings too strict for your language or tokenization needs?

Real-World Use Cases

  • Member support agent
    • Answer questions about contributions, transfers, retirement age rules, and benefit projections using indexed pension documentation.
  • Compliance assistant
    • Retrieve policy references for audit trails when staff ask about trustee decisions or operational procedures.
  • Operations copilot
    • Help internal teams locate forms, SLAs, meeting notes, and process docs without manually searching shared drives.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides