How to Integrate Haystack for investment banking with Elasticsearch for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21
haystack-for-investment-bankingelasticsearchmulti-agent-systems

Combining Haystack for investment banking with Elasticsearch gives you a practical retrieval layer for multi-agent systems that need to answer questions from market data, research notes, filings, and internal deal documents. Haystack handles the agent-facing orchestration and retrieval logic, while Elasticsearch gives you fast indexed search across large document sets with filtering, scoring, and hybrid retrieval patterns.

Prerequisites

  • Python 3.10+
  • Access to a running Elasticsearch cluster
  • An index in Elasticsearch containing your investment banking documents
  • Haystack installed in your environment
  • elasticsearch Python client installed
  • API credentials or cluster auth configured if your Elasticsearch instance requires it

Install the packages:

pip install haystack-ai elasticsearch

Integration Steps

  1. Create an Elasticsearch connection and verify the cluster is reachable.
from elasticsearch import Elasticsearch

es = Elasticsearch(
    "http://localhost:9200",
    basic_auth=("elastic", "changeme")
)

print(es.info())

This should return cluster metadata. If it fails, fix connectivity before touching Haystack.

  1. Create an Elasticsearch-backed document store for Haystack.

Haystack’s retrieval components expect a document store abstraction. For Elasticsearch, use the official integration class from the Haystack ecosystem.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts="http://localhost:9200",
    basic_auth=("elastic", "changeme"),
    index="investment_banking_docs"
)

At this point, Haystack can write and read documents from the same index your agents will query.

  1. Index investment banking documents into Elasticsearch through Haystack.

Use Haystack Document objects so your pipeline stays consistent across ingestion and retrieval.

from haystack import Document

docs = [
    Document(content="Q2 earnings call transcript for ACME Bank shows net interest margin expansion."),
    Document(content="Pitch deck: acquisition target valuation based on EV/EBITDA multiples."),
    Document(content="Equity research note on regional banks and deposit beta sensitivity.")
]

document_store.write_documents(docs)
print("Documents indexed")

If you already have a preprocessing pipeline, run chunking before write_documents() so retrieval returns smaller, more relevant passages.

  1. Build a retriever that uses Elasticsearch as the search backend.

For multi-agent systems, this is usually the component each agent calls when it needs grounded context.

from haystack.components.retrievers import InMemoryBM25Retriever
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever

retriever = ElasticsearchBM25Retriever(document_store=document_store)

result = retriever.run(query="What do we know about deposit beta sensitivity?")
print(result["documents"])

If you want structured filters for deal team workflows, add metadata during ingestion and filter by sector, region, or deal stage.

  1. Wire the retriever into an agent-facing pipeline.

This keeps retrieval isolated and reusable across multiple agents like research, compliance, and deal support.

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

prompt_template = """
You are an investment banking assistant.
Answer using only the retrieved context.

Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Question: {{ question }}
Answer:
"""

pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))

pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.prompt")

response = pipeline.run({
    "retriever": {"query": "Summarize valuation signals for acquisition targets"},
    "prompt_builder": {"question": "Summarize valuation signals for acquisition targets"}
})

print(response["llm"]["replies"][0])

Testing the Integration

Run a direct retrieval test first. This confirms indexing, search relevance, and Haystack wiring are all working before you add more agents on top.

query = "What does the research note say about regional banks?"
result = retriever.run(query=query)

for doc in result["documents"][:3]:
    print(doc.content)

Expected output:

Equity research note on regional banks and deposit beta sensitivity.
Q2 earnings call transcript for ACME Bank shows net interest margin expansion.

If you get empty results:

  • confirm documents exist in the index
  • check your index name matches exactly
  • verify analyzer settings if you expect tokenized matching
  • inspect metadata filters if you added them

Real-World Use Cases

  • Deal team knowledge assistant
    Let one agent retrieve comps analysis, prior CIMs, and diligence notes from Elasticsearch while another agent drafts IC memos from that context.

  • Research Q&A system
    Build a multi-agent workflow where one agent searches filings and transcripts, another extracts key metrics, and a third generates a concise analyst summary.

  • Compliance-aware document lookup
    Use Elasticsearch filters with Haystack retrieval so agents only see approved content by desk, geography, or confidentiality classification.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides