How to Integrate Haystack for banking with Elasticsearch for AI agents
Combining Haystack for banking with Elasticsearch gives you a practical retrieval layer for agentic banking workflows. Haystack handles document ingestion, chunking, and retrieval orchestration, while Elasticsearch gives you indexed search over policies, product docs, KYC rules, and customer-facing knowledge bases.
The result is a system where an AI agent can answer policy questions, retrieve compliance snippets, and ground responses in searchable bank content instead of guessing.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in your environment
- •Access to your Haystack for banking package and API credentials if your deployment requires them
- •
pipinstalled - •A corpus of banking documents to index, such as:
- •product FAQs
- •lending policy docs
- •AML/KYC procedures
- •support runbooks
Install the core dependencies:
pip install haystack-ai elasticsearch sentence-transformers
If your Haystack for banking setup uses a bank-specific extension package, install that too according to your internal distribution.
Integration Steps
1) Connect to Elasticsearch
Start by creating an Elasticsearch client. Keep this separate from the rest of your agent logic so you can rotate credentials without touching retrieval code.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"http://localhost:9200",
basic_auth=("elastic", "changeme")
)
print(es.info())
If this fails, fix connectivity first. Your AI agent should never depend on a broken search backend.
2) Define the index mapping for bank documents
For Haystack-based retrieval, you want fields for text content and metadata. Store source, document type, and compliance tags so agents can filter results later.
index_name = "banking_docs"
mapping = {
"mappings": {
"properties": {
"content": {"type": "text"},
"title": {"type": "text"},
"source": {"type": "keyword"},
"doc_type": {"type": "keyword"},
"compliance_tag": {"type": "keyword"}
}
}
}
if not es.indices.exists(index=index_name):
es.indices.create(index=index_name, body=mapping)
This is the structure you’ll query from Haystack retrievers or directly from an agent tool wrapper.
3) Index documents through Haystack’s document model
Haystack’s Document object is the cleanest way to normalize content before sending it into Elasticsearch. For banking use cases, keep metadata explicit so downstream filters are deterministic.
from haystack import Document
docs = [
Document(
content="Customers can dispute card transactions within 60 days of posting.",
meta={
"title": "Card Dispute Policy",
"source": "policy_center",
"doc_type": "policy",
"compliance_tag": "customer_rights"
}
),
Document(
content="KYC verification requires government ID and proof of address.",
meta={
"title": "KYC Checklist",
"source": "aml_manual",
"doc_type": "procedure",
"compliance_tag": "aml_kyc"
}
)
]
Now write them into Elasticsearch. If you’re using Haystack pipelines in production, this step usually sits behind a document ingestion job.
actions = []
for i, doc in enumerate(docs):
actions.append({
"_index": index_name,
"_id": str(i),
"_source": {
"content": doc.content,
**doc.meta
}
})
from elasticsearch.helpers import bulk
bulk(es, actions)
es.indices.refresh(index=index_name)
4) Wire Elasticsearch into a Haystack retriever
Use Haystack’s Elasticsearch integration to query indexed banking content from your agent pipeline. In current Haystack setups this is typically done with ElasticsearchBM25Retriever or an embedding retriever depending on your architecture.
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever
from haystack import Document
retriever = ElasticsearchBM25Retriever(
client=es,
index=index_name,
top_k=3
)
results = retriever.run(query="How long do customers have to dispute card transactions?")
print(results["documents"])
If you want semantic retrieval instead of lexical search, swap in embeddings and use a vector-enabled Elasticsearch index. The integration pattern stays the same: Haystack orchestrates retrieval; Elasticsearch stores and searches the corpus.
5) Attach retrieval to an AI agent workflow
Once retrieval works, expose it as a tool or pipeline step inside your agent system. The key is to return grounded context before generation.
def retrieve_bank_context(query: str):
result = retriever.run(query=query)
docs = result["documents"]
return [
{
"content": d.content,
**d.meta
}
for d in docs
]
query = "What does our KYC process require?"
context = retrieve_bank_context(query)
for item in context:
print(item["title"], "-", item["content"])
In an actual agent loop, pass context into your prompt template or response generator. That keeps answers tied to bank-approved source material.
Testing the Integration
Run a direct retrieval test against known content. You want to verify indexing, search relevance, and metadata preservation in one shot.
query = "How long do customers have to dispute card transactions?"
result = retriever.run(query=query)
for doc in result["documents"]:
print(doc.meta["title"])
print(doc.content)
Expected output:
Card Dispute Policy
Customers can dispute card transactions within 60 days of posting.
If you get empty results:
- •confirm the index exists
- •confirm documents were refreshed after bulk indexing
- •check whether your query terms match the stored wording
- •verify the retriever is pointed at the correct cluster and index
Real-World Use Cases
- •Policy Q&A for bankers and support agents
- •Retrieve approved answers from internal policy docs instead of relying on model memory.
- •AML/KYC assistant
- •Ground responses in compliance procedures, onboarding checklists, and escalation paths.
- •Customer service copilot
- •Let agents ask natural-language questions like “What’s our chargeback window?” and get sourced answers from Elasticsearch-backed banking knowledge bases.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit