How to Integrate Haystack for banking with Elasticsearch for RAG
Combining Haystack for banking with Elasticsearch gives you a practical RAG stack for regulated environments. Haystack handles the retrieval pipeline and agent orchestration, while Elasticsearch gives you fast hybrid search over policies, product docs, customer-facing scripts, and internal knowledge bases.
This is the setup I’d use when building an AI agent that needs to answer banking questions with traceable sources, low latency, and strong control over what gets retrieved.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in your VPC
- •A Haystack installation with the banking components you use in your project
- •An embedding model available through Haystack
- •Banking documents prepared as plain text, PDFs, or HTML converted to text
- •Elasticsearch index permissions for creating indices and writing documents
Install the core packages:
pip install haystack-ai elasticsearch
If your setup uses a specific banking distribution or extension of Haystack, make sure its package is installed too.
Integration Steps
- •Start by connecting to Elasticsearch
You need a client first. In production, use TLS and API keys. For local testing, basic auth is enough.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"http://localhost:9200",
basic_auth=("elastic", "changeme")
)
print(es.info())
- •Create a bank-specific index for RAG
Keep your retrieval data isolated by domain. For example, one index for retail banking policies and another for fraud operations.
index_name = "banking-rag"
if not es.indices.exists(index=index_name):
es.indices.create(
index=index_name,
mappings={
"properties": {
"content": {"type": "text"},
"title": {"type": "keyword"},
"source": {"type": "keyword"},
"embedding": {
"type": "dense_vector",
"dims": 384,
"index": True,
"similarity": "cosine"
}
}
}
)
That dense_vector field is what lets you do vector search from your Haystack pipeline.
- •Index documents using Haystack components
Haystack pipelines typically use a document store plus an embedder. If your banking package exposes prebuilt helpers, wire them into the same flow. The pattern below shows the standard Haystack API shape.
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts=["http://localhost:9200"],
basic_auth=("elastic", "changeme"),
index=index_name,
embedding_dim=384,
)
docs = [
Document(content="Retail customers can dispute card transactions within 60 days.", meta={"title": "Card Disputes", "source": "policy_001"}),
Document(content="Mortgage prepayment penalties are waived after year five.", meta={"title": "Mortgage Terms", "source": "policy_002"}),
]
embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
writer = DocumentWriter(document_store=document_store)
pipe = Pipeline()
pipe.add_component("embedder", embedder)
pipe.add_component("writer", writer)
pipe.connect("embedder.documents", "writer.documents")
pipe.run({"embedder": {"documents": docs}})
If your banking layer wraps ingestion with approval checks or PII redaction, place that before the embedder.
- •Build the retrieval pipeline for RAG
This is where Haystack queries Elasticsearch and returns top matches to your generator or agent toolchain.
from haystack.components.retrievers import EmbeddingRetriever
from haystack.components.embedders import SentenceTransformersTextEmbedder
query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
retriever = EmbeddingRetriever(document_store=document_store)
rag_pipe = Pipeline()
rag_pipe.add_component("query_embedder", query_embedder)
rag_pipe.add_component("retriever", retriever)
rag_pipe.connect("query_embedder.embedding", "retriever.query_embedding")
result = rag_pipe.run({
"query_embedder": {"text": "How long do customers have to dispute a card transaction?"}
})
print(result["retriever"]["documents"][0].content)
In a real agent system, this retrieval step feeds the prompt context for your LLM response layer.
- •Wire it into an agent response path
For banking use cases, keep generation separate from retrieval so you can inspect sources before answering. That makes auditing easier.
from haystack.components.builders import PromptBuilder
template = """
Answer the question using only the provided documents.
Question: {{question}}
Documents:
{% for doc in documents %}
- {{ doc.content }} (source: {{ doc.meta.source }})
{% endfor %}
"""
prompt_builder = PromptBuilder(template=template)
docs = result["retriever"]["documents"]
prompt_result = prompt_builder.run({
"question": "How long do customers have to dispute a card transaction?",
"documents": docs
})
print(prompt_result["prompt"])
At this point you can pass prompt_result["prompt"] into your model client and return a grounded answer with citations.
Testing the Integration
Use a simple query against known bank policy content and verify Elasticsearch returns the right document through Haystack.
test_query = "What is the dispute window for card transactions?"
query_result = rag_pipe.run({
"query_embedder": {"text": test_query}
})
top_docs = query_result["retriever"]["documents"]
print("Top result:", top_docs[0].meta["title"])
print("Content:", top_docs[0].content)
Expected output:
Top result: Card Disputes
Content: Retail customers can dispute card transactions within 60 days.
If that comes back correctly, your vector path is working end to end.
Real-World Use Cases
- •
Policy Q&A assistant
- •Let relationship managers ask questions about lending rules, fee waivers, KYC policy, or card disputes and get answers grounded in approved internal docs.
- •
Operations copilot
- •Retrieve runbooks for fraud ops, account servicing, chargeback handling, and escalation procedures inside an internal AI agent.
- •
Customer support deflection
- •Power a chatbot that answers common banking questions using indexed product FAQs and compliance-approved support articles instead of free-form generation alone.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit