How to Integrate Haystack for fintech with Elasticsearch for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21
haystack-for-fintechelasticsearchmulti-agent-systems

Combining Haystack for fintech with Elasticsearch gives you a practical retrieval layer for agent systems that need to answer questions over regulated, high-volume financial data. Haystack handles pipeline orchestration and document retrieval, while Elasticsearch gives you fast indexing, filtering, and hybrid search across filings, policies, transaction notes, and support knowledge.

For multi-agent systems, this matters because each agent can query the same indexed corpus with different intent: one agent can fetch compliance evidence, another can summarize customer risk, and another can pull transaction context for investigations.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in the cloud
  • A Haystack-compatible environment installed
  • Access to your fintech documents:
    • PDFs
    • policy docs
    • KYC/AML notes
    • support tickets
    • transaction metadata
  • Environment variables configured:
    • ELASTICSEARCH_URL
    • ELASTICSEARCH_API_KEY if using Elastic Cloud
  • Basic familiarity with:
    • Haystack pipelines
    • document ingestion
    • embeddings or keyword search

Install the core packages:

pip install haystack-ai elasticsearch sentence-transformers

If you're using a Haystack setup with Elasticsearch integration packages, install those too:

pip install haystack-integrations

Integration Steps

1. Connect to Elasticsearch

Start by creating a client and verifying the cluster is reachable. In production, use API keys and TLS.

import os
from elasticsearch import Elasticsearch

es = Elasticsearch(
    os.getenv("ELASTICSEARCH_URL", "http://localhost:9200"),
    api_key=os.getenv("ELASTICSEARCH_API_KEY")
)

print(es.info())

If this fails, do not move on. Your Haystack pipeline will only be as reliable as the search backend.

2. Create a Haystack document store backed by Elasticsearch

Haystack uses a document store to persist and query documents. For Elasticsearch-backed retrieval, wire the store directly to your cluster.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=os.getenv("ELASTICSEARCH_URL", "http://localhost:9200"),
    index="fintech_docs",
    basic_auth=None,
    api_key=os.getenv("ELASTICSEARCH_API_KEY"),
    embedding_similarity_function="cosine"
)

This index will hold your fintech corpus. Keep one index per domain if you need strict access boundaries, for example aml_docs, support_docs, and product_docs.

3. Write documents into the index

Use Haystack Document objects and write them into Elasticsearch through the document store.

from haystack import Document

docs = [
    Document(
        content="AML policy requires enhanced due diligence for high-risk jurisdictions.",
        meta={"source": "compliance_manual", "doc_type": "policy"}
    ),
    Document(
        content="Customer dispute escalation must be resolved within 5 business days.",
        meta={"source": "ops_playbook", "doc_type": "support"}
    ),
]

document_store.write_documents(docs)
print("Documents indexed.")

At this point, your agents have something real to retrieve from instead of prompting against raw files.

4. Build a retrieval pipeline for agent use

For multi-agent systems, keep retrieval isolated in its own pipeline so agents can call it as a tool. A standard pattern is EmbeddingRetriever plus an LLM downstream.

from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever

retriever = InMemoryBM25Retriever(document_store=document_store)

pipeline = Pipeline()
pipeline.add_component("retriever", retriever)

query = "What is the policy for high-risk jurisdiction customers?"
result = pipeline.run({
    "retriever": {
        "query": query,
        "top_k": 3
    }
})

for doc in result["retriever"]["documents"]:
    print(doc.content)

For fintech workloads, I usually start with keyword retrieval for compliance text because auditors care about exact wording. If you need semantic matching later, swap in an embedding retriever and keep the same interface to your agents.

5. Expose retrieval as a tool for multiple agents

Each agent should call the same retrieval function rather than duplicating search logic. That keeps ranking behavior consistent across workflows.

def retrieve_fintech_context(query: str, top_k: int = 3):
    result = pipeline.run({
        "retriever": {
            "query": query,
            "top_k": top_k
        }
    })
    return [doc.content for doc in result["retriever"]["documents"]]

risk_agent_context = retrieve_fintech_context(
    "Show me policies related to enhanced due diligence"
)

ops_agent_context = retrieve_fintech_context(
    "What is the SLA for customer dispute resolution?"
)

print(risk_agent_context)
print(ops_agent_context)

That function becomes the shared retrieval tool inside your orchestrator. In practice, one agent might enrich KYC cases while another drafts internal responses using the same backend.

Testing the Integration

Run a simple verification query after indexing. You want to confirm three things:

  • Elasticsearch accepts writes
  • Haystack can read from the index
  • The retrieved content matches your expected domain text
test_query = "enhanced due diligence high-risk jurisdictions"
results = retrieve_fintech_context(test_query, top_k=1)

assert len(results) == 1
assert "high-risk jurisdictions" in results[0]

print("Integration test passed.")
print(results[0])

Expected output:

Integration test passed.
AML policy requires enhanced due diligence for high-risk jurisdictions.

If you get empty results, check:

  • index name mismatch
  • wrong host or auth config
  • documents not written successfully
  • retriever type not aligned with your stored data

Real-World Use Cases

  • Compliance copilot

    • One agent retrieves policy clauses from Elasticsearch.
    • Another agent drafts regulator-ready responses using those clauses as grounded context.
  • Fraud investigation assistant

    • An investigator agent searches case notes, transaction narratives, and internal playbooks.
    • A second agent summarizes suspicious patterns across related records.
  • Customer operations triage

    • Support agents pull SLA rules and product terms from indexed docs.
    • Escalation agents use the same search layer to decide whether a case needs manual review.

The main pattern here is simple: let Haystack own orchestration and let Elasticsearch own retrieval at scale. That separation gives you a clean foundation for multi-agent fintech systems that need traceable answers and predictable search behavior.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides