How to Integrate Haystack for fintech with Elasticsearch for production AI

By Cyprian AaronsUpdated 2026-04-21
haystack-for-fintechelasticsearchproduction-ai

Combining Haystack for fintech with Elasticsearch gives you a practical retrieval layer for production AI agents: Haystack handles the orchestration, document pipelines, and answer generation, while Elasticsearch gives you low-latency full-text search, filtering, and scalable indexing.

For fintech systems, that means your agent can answer questions over policies, KYC docs, transaction notes, product terms, and support tickets without falling back to brittle keyword search or dumping everything into a vector store.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in your environment
  • elasticsearch Python client installed
  • Haystack installed with the Elasticsearch integration
  • API keys or credentials for any LLM component you plan to use
  • A document source ready for ingestion:
    • PDFs
    • policy docs
    • internal knowledge base articles
    • compliance procedures

Install the packages:

pip install haystack-ai elasticsearch

If you’re using Haystack’s Elasticsearch document store integration, make sure the relevant extra is available in your environment. In production, pin versions explicitly.

Integration Steps

1) Connect to Elasticsearch

Start by creating an Elasticsearch client and verifying cluster access. Use TLS and authentication in production.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "https://localhost:9200",
    basic_auth=("elastic", "changeme"),
    verify_certs=False,
)

print(es.info())

If this fails, stop here. Your Haystack pipeline will only be as stable as your search backend.

2) Create a Haystack document store backed by Elasticsearch

Haystack uses a DocumentStore abstraction. For Elasticsearch-backed retrieval, create an ElasticsearchDocumentStore and point it at the same cluster.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts="https://localhost:9200",
    basic_auth=("elastic", "changeme"),
    index="fintech_docs",
    verify_certs=False,
)

This index will hold your fintech documents. Use separate indices per domain if you need strict access boundaries, like retail_banking_docs or aml_policies.

3) Write fintech documents into Elasticsearch through Haystack

Use Haystack Document objects and write them through the document store. Keep metadata structured so you can filter by product, region, or compliance domain.

from haystack import Document

docs = [
    Document(
        content="Customers can dispute card transactions within 60 days.",
        meta={"source": "card_policy", "product": "cards", "region": "US"}
    ),
    Document(
        content="KYC refresh is required every 24 months for retail customers.",
        meta={"source": "compliance_playbook", "product": "onboarding", "region": "EU"}
    ),
]

document_store.write_documents(docs)

At this point, Elasticsearch contains searchable fintech knowledge. You now have the raw material for retrieval-augmented generation.

4) Build a retrieval pipeline in Haystack

Create a retriever that queries Elasticsearch and returns the most relevant documents. For production AI agents, keep retrieval narrow and deterministic.

from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIChatGenerator
from haystack import Pipeline

# If you're using Elasticsearch-backed retrieval in your setup,
# swap this retriever for the Elasticsearch retriever component available in your Haystack version.
retriever = InMemoryBM25Retriever(document_store=document_store)

prompt_builder = PromptBuilder(
    template="""
You are a fintech support assistant.
Answer only from the provided documents.

Question: {{question}}

Documents:
{% for doc in documents %}
- {{ doc.content }} (source={{ doc.meta.source }})
{% endfor %}

Answer:
"""
)

generator = OpenAIChatGenerator(model="gpt-4o-mini")

pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", generator)

pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.messages")

In many production setups you’ll use an Elasticsearch-specific retriever component instead of BM25 in-memory retrieval. The pattern stays the same: retrieve from Elasticsearch, build grounded prompts, generate answers.

5) Run a query end-to-end

Now wire the question into the pipeline and inspect the response.

question = "How long do customers have to dispute card transactions?"

result = pipeline.run({
    "retriever": {"query": question},
    "prompt_builder": {"question": question},
})

print(result["llm"]["replies"][0].text)

If your indexing and retrieval are correct, the model should answer from the policy text instead of hallucinating a finance rule.

Testing the Integration

Run a direct search against Elasticsearch first, then validate Haystack retrieval on top of it. This isolates backend issues from prompt issues.

query_result = es.search(
    index="fintech_docs",
    query={
        "match": {
            "content": "dispute card transactions"
        }
    }
)

hits = query_result["hits"]["hits"]
print(f"Hits: {len(hits)}")
for hit in hits[:3]:
    print(hit["_source"]["content"])

Expected output:

Hits: 1
Customers can dispute card transactions within 60 days.

If that works but Haystack doesn’t return useful results, check:

  • document store index name
  • metadata mapping
  • retriever configuration
  • prompt template grounding

Real-World Use Cases

  • Customer support copilots
    • Answer questions about card disputes, fee schedules, loan terms, and account policies using indexed internal docs.
  • Compliance assistants
    • Retrieve KYC/AML procedures by region or product line and keep answers grounded in approved policy text.
  • Ops knowledge agents
    • Search incident runbooks, payment failure playbooks, and reconciliation procedures with fast filtered retrieval from Elasticsearch.

The main pattern here is simple: let Elasticsearch handle retrieval at scale, let Haystack orchestrate the agent flow, and keep every answer tied to indexed source material. That’s how you build something usable in production instead of a demo that breaks on real fintech data.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides