How to Integrate Haystack for fintech with Elasticsearch for AI agents

By Cyprian AaronsUpdated 2026-04-21
haystack-for-fintechelasticsearchai-agents

Haystack for fintech gives you the retrieval and agent orchestration layer. Elasticsearch gives you durable, low-latency search over structured and unstructured financial data.

Combined, they let an AI agent answer questions like “show me all high-risk merchants with chargeback spikes in the last 30 days” using indexed documents, filters, and retrieval pipelines instead of brittle prompt-only logic.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in your VPC
  • A Haystack-compatible setup for your fintech agent project
  • API credentials for your embedding model or LLM provider
  • Financial documents ready to index:
    • transaction records
    • policy docs
    • KYC/AML notes
    • support tickets
  • Installed packages:
    • haystack-ai
    • elasticsearch
    • sentence-transformers or your embedding provider SDK

Integration Steps

  1. Install the dependencies and verify Elasticsearch connectivity.
pip install haystack-ai elasticsearch sentence-transformers
from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

print(es.info())

If that returns cluster metadata, your search backend is reachable.

  1. Create a Haystack document pipeline that prepares fintech data for indexing.
from haystack import Document

docs = [
    Document(
        content="Merchant ABC showed a 42% increase in chargebacks over 14 days.",
        meta={"customer_id": "CUST-1001", "risk_score": 87, "doc_type": "risk_report"}
    ),
    Document(
        content="KYC review completed for customer CUST-1002 with no adverse findings.",
        meta={"customer_id": "CUST-1002", "risk_score": 12, "doc_type": "kyc_note"}
    ),
]

In production, these Document objects usually come from a parser or ETL job, not hardcoded strings.

  1. Index the documents into Elasticsearch using Haystack’s document store.
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=["http://localhost:9200"],
    index="fintech_docs",
    embedding_dim=384,
)

document_store.write_documents(docs)

This is the core integration point. Haystack manages document storage semantics, while Elasticsearch handles indexing and retrieval.

  1. Add embeddings so semantic retrieval works for agent queries.
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

embedded_docs = doc_embedder.run(documents=docs)["documents"]
document_store.write_documents(embedded_docs)

For a real system, run this as part of your ingestion pipeline before indexing.

  1. Build a retrieval step your AI agent can call during tool use.
from haystack.components.retrievers import InMemoryEmbeddingRetriever

retriever = InMemoryEmbeddingRetriever(document_store=document_store)

result = retriever.run(
    query_embedding=query_embedder.run(text="Which merchant has rising chargeback risk?")["embedding"],
    top_k=3,
)

for doc in result["documents"]:
    print(doc.content, doc.meta)

If you need hybrid search, combine keyword filters from Elasticsearch with vector retrieval patterns exposed through Haystack components.

Testing the Integration

Run a simple end-to-end check: write one document, query it semantically, and confirm the right record comes back.

from haystack import Document
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever

store = ElasticsearchDocumentStore(
    hosts=["http://localhost:9200"],
    index="fintech_test",
    embedding_dim=384,
)

doc = Document(
    content="AML alert generated for customer CUST-2001 due to unusual wire transfer volume.",
    meta={"customer_id": "CUST-2001", "alert_type": "aml"}
)

store.write_documents([doc])

text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
query_embedding = text_embedder.run(text="Why was AML triggered for this customer?")["embedding"]

retriever = InMemoryEmbeddingRetriever(document_store=store)
response = retriever.run(query_embedding=query_embedding, top_k=1)

print(response["documents"][0].content)
print(response["documents"][0].meta)

Expected output:

AML alert generated for customer CUST-2001 due to unusual wire transfer volume.
{'customer_id': 'CUST-2001', 'alert_type': 'aml'}

If that matches, your agent can now retrieve relevant fintech context from Elasticsearch through Haystack.

Real-World Use Cases

  • Fraud investigation assistant

    • Retrieve suspicious transactions, linked entities, and prior analyst notes.
    • Let the agent summarize evidence with citations from indexed records.
  • AML/KYC compliance copilot

    • Search customer files, adverse media notes, and review history.
    • Use filters like customer_id, jurisdiction, and risk_score to narrow results fast.
  • Customer support escalation agent

    • Pull payment failure logs, dispute history, and policy snippets.
    • Give support teams grounded answers instead of asking them to dig through dashboards manually.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides