How to Integrate Haystack for investment banking with Elasticsearch for production AI

By Cyprian AaronsUpdated 2026-04-21

haystack-for-investment-bankingelasticsearchproduction-ai

Connecting Haystack for investment banking with Elasticsearch gives you a practical retrieval layer for agentic workflows: fast semantic search over filings, research notes, term sheets, and internal policy docs. In production AI systems, this is the difference between a chat demo and an assistant that can answer banker-grade questions with traceable evidence.

Prerequisites

•Python 3.10+
•An Elasticsearch cluster running locally or in your VPC
•API credentials for your Haystack for investment banking environment
•pip access to install Haystack and the Elasticsearch client
•
A document set to index:
- •SEC filings
- •deal memos
- •equity research PDFs
- •internal compliance policies
•Network access between your app and Elasticsearch

Install the Python packages:

pip install haystack-ai elasticsearch sentence-transformers

Integration Steps

•Set up the Elasticsearch client.

Use the official Python client and point it at your cluster. For production, use TLS and auth, not anonymous local access.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "https://localhost:9200",
    basic_auth=("elastic", "your-password"),
    verify_certs=False,
)

print(es.info())

•Create an index for banking documents.

Store both text and embeddings if you want semantic retrieval. Keep metadata fields for ticker, deal ID, document type, and timestamp.

index_name = "investment-banking-docs"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "embedding": {"type": "dense_vector", "dims": 384},
                "ticker": {"type": "keyword"},
                "doc_type": {"type": "keyword"},
                "deal_id": {"type": "keyword"},
                "created_at": {"type": "date"}
            }
        }
    )

•Build a Haystack pipeline that writes documents into Elasticsearch.

Haystack’s DocumentWriter works with a DocumentStore. For Elasticsearch-backed retrieval, use the Elasticsearch document store from Haystack and write normalized documents into it.

from haystack import Document, Pipeline
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=["https://localhost:9200"],
    basic_auth=("elastic", "your-password"),
    index="investment-banking-docs",
    verify_certs=False,
)

writer = DocumentWriter(document_store=document_store)

docs = [
    Document(
        content="Q3 revenue increased 18% year over year driven by advisory fees.",
        meta={"ticker": "JPM", "doc_type": "earnings_note", "deal_id": "none"}
    ),
    Document(
        content="The merger agreement includes a termination fee of $250M.",
        meta={"ticker": "MSFT", "doc_type": "deal_memo", "deal_id": "deal-1042"}
    ),
]

pipe = Pipeline()
pipe.add_component("writer", writer)

result = pipe.run({"writer": {"documents": docs}})
print(result)

•Add retrieval for your AI agent.

For production AI, you usually want hybrid search: lexical matching for exact terms like CUSIP or covenant language, plus semantic search for analyst-style questions. Haystack can retrieve from Elasticsearch through the document store.

from haystack.components.retrievers import InMemoryBM25Retriever

# If you're using Elasticsearch as your persistence layer,
# you can still wire retrievers in front of an LLM pipeline.
# For pure Elasticsearch-backed retrieval in Haystack,
# use the document store's query methods directly.
query = "What was the revenue growth driver in Q3?"
results = document_store.filter_documents(filters={"ticker": ["JPM"]})

for doc in results[:5]:
    print(doc.content, doc.meta)

If you want direct vector retrieval from Elasticsearch, use the index with embeddings generated upstream and query via Elasticsearch’s knn API:

query_embedding = [0.01] * 384

response = es.search(
    index=index_name,
    knn={
        "field": "embedding",
        "query_vector": query_embedding,
        "k": 5,
        "num_candidates": 50,
    },
    _source=["content", "ticker", "doc_type", "deal_id"]
)

for hit in response["hits"]["hits"]:
    print(hit["_source"]["content"])

•Connect retrieval to your agent prompt flow.

Your agent should fetch evidence first, then answer with citations. Keep the model out of raw document stores; let Haystack orchestrate retrieval and pass only relevant chunks to the generator.

def build_context(question: str):
    hits = es.search(
        index=index_name,
        query={
            "match": {
                "content": question
            }
        },
        size=3,
        _source=["content", "ticker", "doc_type"]
    )
    
    chunks = []
    for hit in hits["hits"]["hits"]:
        src = hit["_source"]
        chunks.append(f"[{src['ticker']} | {src['doc_type']}] {src['content']}")
    
    return "\n".join(chunks)

context = build_context("What drove JPM revenue growth?")
print(context)

Testing the Integration

Run a round-trip test: write a document, query it back from Elasticsearch, and confirm Haystack can read what it stored.

test_doc = Document(
    content="Net interest income rose due to higher rates and balance sheet growth.",
    meta={"ticker": "BAC", "doc_type": "research_note"}
)

document_store.write_documents([test_doc])

res = es.search(
    index=index_name,
    query={"match": {"content": "net interest income rates"}},
    size=1,
)

print(res["hits"]["hits"][0]["_source"]["content"])

Expected output:

Net interest income rose due to higher rates and balance sheet growth.

If that returns the right chunk, your indexing path is working end to end.

Real-World Use Cases

•
Deal desk assistant
- •Search past term sheets, redlines, and approval memos while drafting new transactions.
•
Equity research copilot
- •Pull relevant earnings commentary, guidance changes, and prior analyst notes before generating a summary.
•
Compliance Q&A bot
- •Retrieve policy language on restricted lists, MNPI handling, KYC rules, or retention requirements with source-backed answers.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit