How to Integrate Haystack for fintech with Elasticsearch for AI agents
Haystack for fintech gives you the retrieval and agent orchestration layer. Elasticsearch gives you durable, low-latency search over structured and unstructured financial data.
Combined, they let an AI agent answer questions like “show me all high-risk merchants with chargeback spikes in the last 30 days” using indexed documents, filters, and retrieval pipelines instead of brittle prompt-only logic.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in your VPC
- •A Haystack-compatible setup for your fintech agent project
- •API credentials for your embedding model or LLM provider
- •Financial documents ready to index:
- •transaction records
- •policy docs
- •KYC/AML notes
- •support tickets
- •Installed packages:
- •
haystack-ai - •
elasticsearch - •
sentence-transformersor your embedding provider SDK
- •
Integration Steps
- •Install the dependencies and verify Elasticsearch connectivity.
pip install haystack-ai elasticsearch sentence-transformers
from elasticsearch import Elasticsearch
es = Elasticsearch("http://localhost:9200")
print(es.info())
If that returns cluster metadata, your search backend is reachable.
- •Create a Haystack document pipeline that prepares fintech data for indexing.
from haystack import Document
docs = [
Document(
content="Merchant ABC showed a 42% increase in chargebacks over 14 days.",
meta={"customer_id": "CUST-1001", "risk_score": 87, "doc_type": "risk_report"}
),
Document(
content="KYC review completed for customer CUST-1002 with no adverse findings.",
meta={"customer_id": "CUST-1002", "risk_score": 12, "doc_type": "kyc_note"}
),
]
In production, these Document objects usually come from a parser or ETL job, not hardcoded strings.
- •Index the documents into Elasticsearch using Haystack’s document store.
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts=["http://localhost:9200"],
index="fintech_docs",
embedding_dim=384,
)
document_store.write_documents(docs)
This is the core integration point. Haystack manages document storage semantics, while Elasticsearch handles indexing and retrieval.
- •Add embeddings so semantic retrieval works for agent queries.
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
embedded_docs = doc_embedder.run(documents=docs)["documents"]
document_store.write_documents(embedded_docs)
For a real system, run this as part of your ingestion pipeline before indexing.
- •Build a retrieval step your AI agent can call during tool use.
from haystack.components.retrievers import InMemoryEmbeddingRetriever
retriever = InMemoryEmbeddingRetriever(document_store=document_store)
result = retriever.run(
query_embedding=query_embedder.run(text="Which merchant has rising chargeback risk?")["embedding"],
top_k=3,
)
for doc in result["documents"]:
print(doc.content, doc.meta)
If you need hybrid search, combine keyword filters from Elasticsearch with vector retrieval patterns exposed through Haystack components.
Testing the Integration
Run a simple end-to-end check: write one document, query it semantically, and confirm the right record comes back.
from haystack import Document
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever
store = ElasticsearchDocumentStore(
hosts=["http://localhost:9200"],
index="fintech_test",
embedding_dim=384,
)
doc = Document(
content="AML alert generated for customer CUST-2001 due to unusual wire transfer volume.",
meta={"customer_id": "CUST-2001", "alert_type": "aml"}
)
store.write_documents([doc])
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
query_embedding = text_embedder.run(text="Why was AML triggered for this customer?")["embedding"]
retriever = InMemoryEmbeddingRetriever(document_store=store)
response = retriever.run(query_embedding=query_embedding, top_k=1)
print(response["documents"][0].content)
print(response["documents"][0].meta)
Expected output:
AML alert generated for customer CUST-2001 due to unusual wire transfer volume.
{'customer_id': 'CUST-2001', 'alert_type': 'aml'}
If that matches, your agent can now retrieve relevant fintech context from Elasticsearch through Haystack.
Real-World Use Cases
- •
Fraud investigation assistant
- •Retrieve suspicious transactions, linked entities, and prior analyst notes.
- •Let the agent summarize evidence with citations from indexed records.
- •
AML/KYC compliance copilot
- •Search customer files, adverse media notes, and review history.
- •Use filters like
customer_id,jurisdiction, andrisk_scoreto narrow results fast.
- •
Customer support escalation agent
- •Pull payment failure logs, dispute history, and policy snippets.
- •Give support teams grounded answers instead of asking them to dig through dashboards manually.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit