How to Integrate Haystack for fintech with Elasticsearch for RAG
Combining Haystack for fintech with Elasticsearch gives you a practical RAG stack for regulated data: document ingestion, retrieval, and grounded answer generation over policies, statements, claims, KYC files, and internal runbooks. The value is simple: Haystack handles the agent workflow, while Elasticsearch gives you fast lexical + vector retrieval across large financial corpora.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in the cloud
- •An Elasticsearch API key or username/password
- •Haystack installed with the Elasticsearch integration
- •Access to your fintech documents in PDF, text, HTML, or JSON
- •An embedding model available locally or via API
Install the packages:
pip install haystack-ai elasticsearch-haystack sentence-transformers
If you are using a managed Elasticsearch deployment, make sure:
- •
http://localhost:9200or your cloud endpoint is reachable - •your index can store dense vectors
- •security settings allow indexing and search requests
Integration Steps
- •Set up the Elasticsearch connection.
You need a client first. In Haystack, the ElasticsearchDocumentStore is the main bridge between your documents and the search backend.
from elasticsearch import Elasticsearch
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
es_client = Elasticsearch(
"http://localhost:9200",
basic_auth=("elastic", "changeme"),
)
document_store = ElasticsearchDocumentStore(
hosts=["http://localhost:9200"],
index="fintech_rag_docs",
embedding_dim=384,
similarity="cosine",
)
- •Prepare and write fintech documents into the store.
Haystack uses Document objects. For fintech workloads, keep metadata tight so you can filter by product line, jurisdiction, or document type later.
from haystack import Document
docs = [
Document(
content="AML policy requires enhanced due diligence for transactions over $10,000.",
meta={"source": "policy", "jurisdiction": "US", "doc_type": "aml"}
),
Document(
content="Customer onboarding requires identity verification before account activation.",
meta={"source": "ops_manual", "jurisdiction": "EU", "doc_type": "kyc"}
),
]
document_store.write_documents(docs)
- •Generate embeddings and index them in Elasticsearch.
For RAG, you want vector search enabled. Use an embedder component to turn text into vectors before writing to the document store.
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
document_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"
)
document_embedder.warm_up()
embedded_docs = document_embedder.run(documents=docs)["documents"]
document_store.write_documents(embedded_docs)
- •Build a retrieval pipeline with Haystack.
This is where Haystack orchestrates the query flow. The retriever queries Elasticsearch, then passes top matches to your generator or agent layer.
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchEmbeddingRetriever
query_embedder = SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"
)
retriever = ElasticsearchEmbeddingRetriever(document_store=document_store)
rag_pipeline = Pipeline()
rag_pipeline.add_component("query_embedder", query_embedder)
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
- •Add generation on top of retrieved context.
For a real agent system, retrieval alone is not enough. You need a generator that uses the retrieved passages as grounded context.
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
template = """
Use only the following context to answer the question.
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Question: {{ question }}
Answer:
"""
prompt_builder = PromptBuilder(template=template)
llm = OpenAIGenerator(model="gpt-4o-mini")
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")
Run a full query through the pipeline:
question = "What checks are required before activating a new customer account?"
result = rag_pipeline.run({
"query_embedder": {"text": question},
"retriever": {"top_k": 3},
"prompt_builder": {"question": question},
})
print(result["llm"]["replies"][0])
Testing the Integration
Use a direct retrieval test first before wiring this into an agent loop. If this fails, your embeddings or index configuration is wrong.
test_query = "When do we apply enhanced due diligence?"
query_embedding = query_embedder.run(text=test_query)["embedding"]
hits = retriever.run(query_embedding=query_embedding, top_k=2)["documents"]
for doc in hits:
print(doc.content)
print(doc.meta)
Expected output:
AML policy requires enhanced due diligence for transactions over $10,000.
{'source': 'policy', 'jurisdiction': 'US', 'doc_type': 'aml'}
If you get relevant docs back with matching metadata, your Haystack + Elasticsearch RAG path is working.
Real-World Use Cases
- •Compliance copilots that answer questions from AML/KYC policies with citations from approved internal documents.
- •Claims and underwriting assistants that retrieve product rules, fraud indicators, and process notes from indexed knowledge bases.
- •Bank operations agents that look up escalation playbooks, transaction monitoring guidance, and customer support procedures with low-latency retrieval.
The pattern here is stable: use Haystack for orchestration and prompt control, use Elasticsearch for retrieval at scale. Once that foundation works, you can add reranking, metadata filters, audit logging, and human review without changing the core architecture.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit