How to Integrate Haystack for fintech with Elasticsearch for production AI
Combining Haystack for fintech with Elasticsearch gives you a practical retrieval layer for production AI agents: Haystack handles the orchestration, document pipelines, and answer generation, while Elasticsearch gives you low-latency full-text search, filtering, and scalable indexing.
For fintech systems, that means your agent can answer questions over policies, KYC docs, transaction notes, product terms, and support tickets without falling back to brittle keyword search or dumping everything into a vector store.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in your environment
- •
elasticsearchPython client installed - •Haystack installed with the Elasticsearch integration
- •API keys or credentials for any LLM component you plan to use
- •A document source ready for ingestion:
- •PDFs
- •policy docs
- •internal knowledge base articles
- •compliance procedures
Install the packages:
pip install haystack-ai elasticsearch
If you’re using Haystack’s Elasticsearch document store integration, make sure the relevant extra is available in your environment. In production, pin versions explicitly.
Integration Steps
1) Connect to Elasticsearch
Start by creating an Elasticsearch client and verifying cluster access. Use TLS and authentication in production.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"https://localhost:9200",
basic_auth=("elastic", "changeme"),
verify_certs=False,
)
print(es.info())
If this fails, stop here. Your Haystack pipeline will only be as stable as your search backend.
2) Create a Haystack document store backed by Elasticsearch
Haystack uses a DocumentStore abstraction. For Elasticsearch-backed retrieval, create an ElasticsearchDocumentStore and point it at the same cluster.
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts="https://localhost:9200",
basic_auth=("elastic", "changeme"),
index="fintech_docs",
verify_certs=False,
)
This index will hold your fintech documents. Use separate indices per domain if you need strict access boundaries, like retail_banking_docs or aml_policies.
3) Write fintech documents into Elasticsearch through Haystack
Use Haystack Document objects and write them through the document store. Keep metadata structured so you can filter by product, region, or compliance domain.
from haystack import Document
docs = [
Document(
content="Customers can dispute card transactions within 60 days.",
meta={"source": "card_policy", "product": "cards", "region": "US"}
),
Document(
content="KYC refresh is required every 24 months for retail customers.",
meta={"source": "compliance_playbook", "product": "onboarding", "region": "EU"}
),
]
document_store.write_documents(docs)
At this point, Elasticsearch contains searchable fintech knowledge. You now have the raw material for retrieval-augmented generation.
4) Build a retrieval pipeline in Haystack
Create a retriever that queries Elasticsearch and returns the most relevant documents. For production AI agents, keep retrieval narrow and deterministic.
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIChatGenerator
from haystack import Pipeline
# If you're using Elasticsearch-backed retrieval in your setup,
# swap this retriever for the Elasticsearch retriever component available in your Haystack version.
retriever = InMemoryBM25Retriever(document_store=document_store)
prompt_builder = PromptBuilder(
template="""
You are a fintech support assistant.
Answer only from the provided documents.
Question: {{question}}
Documents:
{% for doc in documents %}
- {{ doc.content }} (source={{ doc.meta.source }})
{% endfor %}
Answer:
"""
)
generator = OpenAIChatGenerator(model="gpt-4o-mini")
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", generator)
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.messages")
In many production setups you’ll use an Elasticsearch-specific retriever component instead of BM25 in-memory retrieval. The pattern stays the same: retrieve from Elasticsearch, build grounded prompts, generate answers.
5) Run a query end-to-end
Now wire the question into the pipeline and inspect the response.
question = "How long do customers have to dispute card transactions?"
result = pipeline.run({
"retriever": {"query": question},
"prompt_builder": {"question": question},
})
print(result["llm"]["replies"][0].text)
If your indexing and retrieval are correct, the model should answer from the policy text instead of hallucinating a finance rule.
Testing the Integration
Run a direct search against Elasticsearch first, then validate Haystack retrieval on top of it. This isolates backend issues from prompt issues.
query_result = es.search(
index="fintech_docs",
query={
"match": {
"content": "dispute card transactions"
}
}
)
hits = query_result["hits"]["hits"]
print(f"Hits: {len(hits)}")
for hit in hits[:3]:
print(hit["_source"]["content"])
Expected output:
Hits: 1
Customers can dispute card transactions within 60 days.
If that works but Haystack doesn’t return useful results, check:
- •document store index name
- •metadata mapping
- •retriever configuration
- •prompt template grounding
Real-World Use Cases
- •Customer support copilots
- •Answer questions about card disputes, fee schedules, loan terms, and account policies using indexed internal docs.
- •Compliance assistants
- •Retrieve KYC/AML procedures by region or product line and keep answers grounded in approved policy text.
- •Ops knowledge agents
- •Search incident runbooks, payment failure playbooks, and reconciliation procedures with fast filtered retrieval from Elasticsearch.
The main pattern here is simple: let Elasticsearch handle retrieval at scale, let Haystack orchestrate the agent flow, and keep every answer tied to indexed source material. That’s how you build something usable in production instead of a demo that breaks on real fintech data.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit