How to Integrate Haystack for wealth management with Elasticsearch for AI agents

By Cyprian AaronsUpdated 2026-04-21

haystack-for-wealth-managementelasticsearchai-agents

Combining Haystack for wealth management with Elasticsearch gives you a clean path from document retrieval to agent reasoning. In practice, this is what lets an AI agent answer advisor questions from policy docs, prospectuses, client notes, and market research without doing dumb keyword search over a pile of PDFs.

The pattern is simple: Elasticsearch handles indexing and retrieval at scale, while Haystack for wealth management turns those retrieved documents into structured context your agent can reason over.

Prerequisites

•Python 3.10+
•An Elasticsearch cluster running locally or in Elastic Cloud
•API credentials for Elasticsearch if you’re using a managed deployment
•Haystack installed in your environment
•
A corpus of wealth management documents:
- •portfolio reports
- •fund fact sheets
- •suitability policies
- •advisor notes
•Basic familiarity with Python async/sync execution

Install the packages:

pip install haystack-ai elasticsearch sentence-transformers

Integration Steps

•Set up Elasticsearch and create an index for wealth management content.

Use a dense vector field so you can support semantic retrieval, not just exact term matching.

from elasticsearch import Elasticsearch

es = Elasticsearch(
    "http://localhost:9200",
    basic_auth=("elastic", "changeme")
)

index_name = "wealth-management-docs"

mapping = {
    "mappings": {
        "properties": {
            "content": {"type": "text"},
            "title": {"type": "text"},
            "source": {"type": "keyword"},
            "embedding": {
                "type": "dense_vector",
                "dims": 384,
                "index": True,
                "similarity": "cosine"
            }
        }
    }
}

if not es.indices.exists(index=index_name):
    es.indices.create(index=index_name, **mapping)

•Load documents into Elasticsearch with embeddings.

Here I’m using a SentenceTransformer embedder because it’s straightforward and production-friendly for controlled deployments.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

docs = [
    {
        "title": "Suitability Policy",
        "content": "Advisors must assess risk tolerance, time horizon, liquidity needs, and investment objectives before recommending products.",
        "source": "policy.pdf"
    },
    {
        "title": "Portfolio Review Notes",
        "content": "Client has moderate risk tolerance and requires income generation over capital appreciation.",
        "source": "notes.txt"
    }
]

for doc in docs:
    embedding = model.encode(doc["content"]).tolist()
    es.index(
        index=index_name,
        document={
            **doc,
            "embedding": embedding
        }
    )

es.indices.refresh(index=index_name)

•Connect Haystack for wealth management to Elasticsearch as the retriever layer.

In Haystack, use ElasticsearchDocumentStore plus EmbeddingRetriever so your agent can ask natural language questions and get relevant context back.

from haystack import Document, Pipeline
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
from haystack.components.retrievers import EmbeddingRetriever

document_store = ElasticsearchDocumentStore(
    hosts=["http://localhost:9200"],
    index=index_name,
    basic_auth=("elastic", "changeme"),
)

retriever = EmbeddingRetriever(
    document_store=document_store,
    query_embedding_model="sentence-transformers/all-MiniLM-L6-v2",
)

query_pipeline = Pipeline()
query_pipeline.add_component("retriever", retriever)

If you’re using Haystack for wealth management specifically, keep your document schema aligned with advisory workflows:

•client_id
•product_type
•risk_band
•jurisdiction
•review_date

That makes filtering much more useful than raw semantic search alone.

•Build the agent query flow with retrieval plus response synthesis.

The agent should retrieve first, then generate. Don’t let the model answer from memory when compliance-sensitive data is involved.

from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

template = """
You are an assistant for wealth management advisors.
Use only the provided documents to answer the question.

Question: {{question}}

Documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Answer:
"""

prompt_builder = PromptBuilder(template=template)
generator = OpenAIGenerator(model="gpt-4o-mini")

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", generator)

rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")

Now run a question through the pipeline:

result = rag_pipeline.run({
    "retriever": {"query": "Can we recommend an income-focused product for this client?"},
    "prompt_builder": {"question": "Can we recommend an income-focused product for this client?"}
})

print(result["llm"]["replies"][0])

•Add metadata filters for advisor-grade retrieval.

This is where most implementations get weak. Wealth management search usually needs constraints by region, product type, or policy version.

filtered_docs = document_store.filter_documents(
    filters={
        "$and": [
            {"source": {"$eq": "policy.pdf"}},
            {"content": {"$contains": "risk tolerance"}}
        ]
    }
)

for doc in filtered_docs:
    print(doc.content)

If you have client-specific fields indexed in Elasticsearch, apply those filters before generation so the agent only sees approved context.

Testing the Integration

Run a direct retrieval test first. If this fails, don’t move on to generation.

query_result = query_pipeline.run({
    "retriever": {
        "query_embedding_model_input": ["What factors must advisors assess before recommending a product?"]
    }
})

docs = query_result["retriever"]["documents"]
for d in docs[:3]:
    print(d.content)

Expected output:

Advisors must assess risk tolerance, time horizon, liquidity needs, and investment objectives before recommending products.
Client has moderate risk tolerance and requires income generation over capital appreciation.

If you want a stronger test, check that the top result matches the policy text and not just a random note. In production, I also assert on metadata fields like source and jurisdiction.

Real-World Use Cases

•Advisor copilot that answers suitability and product-policy questions from internal knowledge bases.
•Client review assistant that summarizes portfolio notes, flags mismatches against risk profiles, and drafts follow-up actions.
•Compliance search layer that finds only approved language across archived research, disclosures, and meeting notes before the agent responds.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit