How to Integrate Haystack for investment banking with Elasticsearch for multi-agent systems
Combining Haystack for investment banking with Elasticsearch gives you a practical retrieval layer for multi-agent systems that need to answer questions from market data, research notes, filings, and internal deal documents. Haystack handles the agent-facing orchestration and retrieval logic, while Elasticsearch gives you fast indexed search across large document sets with filtering, scoring, and hybrid retrieval patterns.
Prerequisites
- •Python 3.10+
- •Access to a running Elasticsearch cluster
- •An index in Elasticsearch containing your investment banking documents
- •Haystack installed in your environment
- •
elasticsearchPython client installed - •API credentials or cluster auth configured if your Elasticsearch instance requires it
Install the packages:
pip install haystack-ai elasticsearch
Integration Steps
- •Create an Elasticsearch connection and verify the cluster is reachable.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"http://localhost:9200",
basic_auth=("elastic", "changeme")
)
print(es.info())
This should return cluster metadata. If it fails, fix connectivity before touching Haystack.
- •Create an Elasticsearch-backed document store for Haystack.
Haystack’s retrieval components expect a document store abstraction. For Elasticsearch, use the official integration class from the Haystack ecosystem.
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts="http://localhost:9200",
basic_auth=("elastic", "changeme"),
index="investment_banking_docs"
)
At this point, Haystack can write and read documents from the same index your agents will query.
- •Index investment banking documents into Elasticsearch through Haystack.
Use Haystack Document objects so your pipeline stays consistent across ingestion and retrieval.
from haystack import Document
docs = [
Document(content="Q2 earnings call transcript for ACME Bank shows net interest margin expansion."),
Document(content="Pitch deck: acquisition target valuation based on EV/EBITDA multiples."),
Document(content="Equity research note on regional banks and deposit beta sensitivity.")
]
document_store.write_documents(docs)
print("Documents indexed")
If you already have a preprocessing pipeline, run chunking before write_documents() so retrieval returns smaller, more relevant passages.
- •Build a retriever that uses Elasticsearch as the search backend.
For multi-agent systems, this is usually the component each agent calls when it needs grounded context.
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever
retriever = ElasticsearchBM25Retriever(document_store=document_store)
result = retriever.run(query="What do we know about deposit beta sensitivity?")
print(result["documents"])
If you want structured filters for deal team workflows, add metadata during ingestion and filter by sector, region, or deal stage.
- •Wire the retriever into an agent-facing pipeline.
This keeps retrieval isolated and reusable across multiple agents like research, compliance, and deal support.
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
prompt_template = """
You are an investment banking assistant.
Answer using only the retrieved context.
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Question: {{ question }}
Answer:
"""
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.prompt")
response = pipeline.run({
"retriever": {"query": "Summarize valuation signals for acquisition targets"},
"prompt_builder": {"question": "Summarize valuation signals for acquisition targets"}
})
print(response["llm"]["replies"][0])
Testing the Integration
Run a direct retrieval test first. This confirms indexing, search relevance, and Haystack wiring are all working before you add more agents on top.
query = "What does the research note say about regional banks?"
result = retriever.run(query=query)
for doc in result["documents"][:3]:
print(doc.content)
Expected output:
Equity research note on regional banks and deposit beta sensitivity.
Q2 earnings call transcript for ACME Bank shows net interest margin expansion.
If you get empty results:
- •confirm documents exist in the index
- •check your index name matches exactly
- •verify analyzer settings if you expect tokenized matching
- •inspect metadata filters if you added them
Real-World Use Cases
- •
Deal team knowledge assistant
Let one agent retrieve comps analysis, prior CIMs, and diligence notes from Elasticsearch while another agent drafts IC memos from that context. - •
Research Q&A system
Build a multi-agent workflow where one agent searches filings and transcripts, another extracts key metrics, and a third generates a concise analyst summary. - •
Compliance-aware document lookup
Use Elasticsearch filters with Haystack retrieval so agents only see approved content by desk, geography, or confidentiality classification.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit