How to Integrate Haystack for fintech with Elasticsearch for startups

By Cyprian AaronsUpdated 2026-04-21
haystack-for-fintechelasticsearchstartups

Combining Haystack for fintech with Elasticsearch gives you a practical pattern for building agent systems that can search, retrieve, and reason over financial data without stuffing everything into a prompt. For startups, this is the difference between a brittle chatbot and an assistant that can answer questions from transaction logs, policy docs, KYC records, or market research with traceable retrieval.

The useful part is simple: Elasticsearch handles indexed search at scale, while Haystack orchestrates retrieval and generation around that index. In a fintech agent, that means faster answers, better grounding, and less hallucination when users ask about accounts, claims, fraud signals, or compliance text.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in the cloud
  • An index populated with your fintech documents
  • Haystack installed with Elasticsearch support
  • Access to an LLM provider if you want generation on top of retrieval
  • API keys or credentials for your model backend

Install the packages:

pip install haystack-ai elasticsearch sentence-transformers

If you’re using Elastic Cloud, keep these ready:

  • ELASTICSEARCH_URL
  • ELASTICSEARCH_API_KEY

Integration Steps

  1. Set up the Elasticsearch client and create a document index

Start by connecting to Elasticsearch and creating an index for your fintech corpus. Use a mapping that supports both text search and vector retrieval if you plan to do hybrid search.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "https://your-cluster.es.europe-west1.gcp.cloud.es.io:9243",
    api_key="YOUR_ELASTIC_API_KEY",
)

index_name = "fintech_docs"

if not es.indices.exists(index=index_name):
    es.indices.create(
        index=index_name,
        mappings={
            "properties": {
                "content": {"type": "text"},
                "title": {"type": "text"},
                "doc_type": {"type": "keyword"},
            }
        },
    )

print(es.info())
  1. Write documents into Elasticsearch in Haystack-compatible form

Haystack works best when your stored documents have stable IDs and metadata. For startup use cases, keep fields like doc_type, customer_id, or policy_id so downstream filters are easy.

from haystack import Document

docs = [
    Document(
        content="Transaction 98321 was flagged for unusual card-not-present activity.",
        meta={"title": "Fraud Alert", "doc_type": "incident", "customer_id": "CUST-1001"},
        id="txn-98321",
    ),
    Document(
        content="KYC review completed for business account ACME Ltd on 2024-03-11.",
        meta={"title": "KYC Review", "doc_type": "compliance", "customer_id": "CUST-2004"},
        id="kyc-acme",
    ),
]

for doc in docs:
    es.index(
        index=index_name,
        id=doc.id,
        document={
            "content": doc.content,
            **doc.meta,
        },
    )

es.indices.refresh(index=index_name)
  1. Connect Haystack to Elasticsearch as a retriever

Haystack provides an Elasticsearch-backed retriever via the ElasticsearchBM25Retriever. This is the simplest path if you want keyword-based retrieval over financial text.

from haystack import Document
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=["https://your-cluster.es.europe-west1.gcp.cloud.es.io:9243"],
    api_key="YOUR_ELASTIC_API_KEY",
    index=index_name,
)

retriever = ElasticsearchBM25Retriever(document_store=document_store)

query = "Which customer had unusual card activity?"
results = retriever.run(query=query)

for doc in results["documents"]:
    print(doc.id, doc.meta.get("title"), doc.content)

If you’re building an AI agent for finance operations, this is enough to ground answers in indexed records before handing them to an LLM.

  1. Build a Haystack pipeline that retrieves from Elasticsearch and generates an answer

Now wire retrieval into a pipeline. This is the part that turns search into an agent-ready workflow.

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

prompt_template = """
You are a fintech support assistant.
Answer only using the documents below.

Documents:
{% for doc in documents %}
- {{ doc.content }} ({{ doc.meta.title }})
{% endfor %}

Question: {{ question }}
Answer:
"""

pipe = Pipeline()
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipe.add_component("llm", OpenAIGenerator(api_key="YOUR_OPENAI_API_KEY", model="gpt-4o-mini"))

pipe.connect("retriever.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")

response = pipe.run({
    "retriever": {"query": "What happened with transaction 98321?"},
    "prompt_builder": {"question": "What happened with transaction 98321?"},
})

print(response["llm"]["replies"][0])
  1. Add metadata filters for startup-grade control

In production, you rarely want every user to see every record. Use filters so your agent only retrieves the right tenant, region, or document class.

results = retriever.run(
    query="Show compliance notes for ACME Ltd",
    filters={
        "operator": "AND",
        "conditions": [
            {"field": "doc_type", "operator": "==", "value": "compliance"},
            {"field": "customer_id", "operator": "==", "value": "CUST-2004"},
        ],
    },
)

for doc in results["documents"]:
    print(doc.meta["title"], doc.content)

Testing the Integration

Run a direct retrieval test first. You want to verify that Haystack can pull back the right record from Elasticsearch before adding generation on top.

test_query = "unusual card-not-present activity"
result = retriever.run(query=test_query)

docs = result["documents"]
print(f"Retrieved {len(docs)} document(s)")
for d in docs[:3]:
    print(f"- {d.id}: {d.meta.get('title')} -> {d.content}")

Expected output:

Retrieved 1 document(s)
- txn-98321: Fraud Alert -> Transaction 98321 was flagged for unusual card-not-present activity.

If you get zero hits, check these first:

  • The index name matches on both sides
  • Documents were refreshed after indexing
  • Your query terms exist in the stored content
  • Filters are not excluding all records

Real-World Use Cases

  • Fraud ops assistant that searches incident notes and returns grounded summaries for analysts.
  • Compliance copilot that retrieves KYC/AML policies plus case history before drafting responses.
  • Customer support agent that answers account questions from internal runbooks and transaction annotations.

This pattern scales well for startups because it starts simple with BM25 search and grows into hybrid retrieval later. Once your corpus gets larger, you can add embeddings, reranking, and tool routing without replacing the core Elasticsearch-backed retrieval layer.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides