How to Integrate Haystack for lending with Elasticsearch for AI agents

By Cyprian AaronsUpdated 2026-04-21
haystack-for-lendingelasticsearchai-agents

Haystack for lending gives you the workflow layer for loan-specific AI tasks: document ingestion, retrieval, and answer generation around credit files, underwriting notes, and policy docs. Elasticsearch gives you the retrieval backbone: fast indexing, filtering, and hybrid search over structured and unstructured lending data.

Put them together and you get an agent that can answer borrower questions, pull relevant policy clauses, and surface loan file evidence with low latency.

Prerequisites

  • Python 3.10+
  • An Elasticsearch cluster running locally or in your cloud environment
  • Access to your lending corpus:
    • loan policies
    • underwriting guidelines
    • borrower documents
    • servicing notes
  • Installed packages:
    • haystack-ai
    • elasticsearch
    • an embedding model package if you want semantic retrieval
  • Environment variables configured:
    • ELASTICSEARCH_URL
    • ELASTICSEARCH_API_KEY if your cluster requires auth

Install the dependencies:

pip install haystack-ai elasticsearch sentence-transformers

Integration Steps

1) Connect to Elasticsearch

Start by creating a client and verifying the cluster is reachable. For lending systems, keep this connection separate from your app logic so you can rotate credentials without touching the agent code.

import os
from elasticsearch import Elasticsearch

es = Elasticsearch(
    os.environ["ELASTICSEARCH_URL"],
    api_key=os.getenv("ELASTICSEARCH_API_KEY"),
)

print(es.info())

If this fails, fix connectivity first. Don’t move on until the cluster responds cleanly.

2) Create a Haystack document store backed by Elasticsearch

Haystack’s ElasticsearchDocumentStore is the bridge between your agent pipeline and the search index. This is where loan docs, policy docs, and case notes will live.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=[os.environ["ELASTICSEARCH_URL"]],
    index="lending_docs",
    basic_auth=(
        os.getenv("ELASTICSEARCH_USERNAME"),
        os.getenv("ELASTICSEARCH_PASSWORD"),
    ) if os.getenv("ELASTICSEARCH_USERNAME") else None,
)

Use a dedicated index per domain:

  • lending_policies
  • loan_files
  • servicing_notes

That keeps access control and retention rules clean.

3) Write lending documents into the index

Convert your source records into Haystack Document objects. In lending workflows, include metadata like product type, jurisdiction, risk band, and document source. Those fields are what make filtering useful later.

from haystack import Document

docs = [
    Document(
        content="A borrower must provide two recent pay stubs and one month of bank statements.",
        meta={
            "doc_type": "underwriting_policy",
            "product": "personal_loan",
            "jurisdiction": "US",
        },
    ),
    Document(
        content="Debt-to-income ratio must not exceed 43% for standard approval.",
        meta={
            "doc_type": "credit_policy",
            "product": "mortgage",
            "jurisdiction": "US",
        },
    ),
]

document_store.write_documents(docs)

For production ingestion:

  • chunk large PDFs before writing
  • preserve source page numbers in metadata
  • normalize dates and numeric fields for filters

4) Build a retriever for your AI agent

Now wire Elasticsearch into Haystack retrieval. For lending agents, use metadata filters aggressively so you don’t retrieve irrelevant policy text from the wrong product line.

from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever

retriever = ElasticsearchBM25Retriever(document_store=document_store)

results = retriever.run(
    query="What documents are required for a personal loan application?",
    filters={
        "operator": "AND",
        "conditions": [
            {"field": "meta.product", "operator": "==", "value": "personal_loan"},
            {"field": "meta.doc_type", "operator": "==", "value": "underwriting_policy"},
        ],
    },
)

for doc in results["documents"]:
    print(doc.content)

If you need semantic search as well, pair BM25 with embeddings in a hybrid pipeline. That’s usually better for borrower-facing Q&A where wording varies across policies.

5) Add the retriever to an agent pipeline

Use Haystack pipelines to connect retrieval to generation. The pattern is simple: query comes in, retriever pulls evidence from Elasticsearch, generator produces an answer grounded in those docs.

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

prompt = PromptBuilder(
    template="""
You are a lending operations assistant.
Answer only using the provided documents.

Question: {{question}}

Documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Answer:
"""
)

generator = OpenAIGenerator(model="gpt-4o-mini")

pipe = Pipeline()
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", prompt)
pipe.add_component("generator", generator)

pipe.connect("retriever.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "generator.prompt")

response = pipe.run(
    {
        "retriever": {
            "query": "What documents are required for a personal loan application?",
            "filters": {
                "operator": "AND",
                "conditions": [
                    {"field": "meta.product", "operator": "==", "value": "personal_loan"}
                ],
            },
        },
        "prompt_builder": {"question": "What documents are required for a personal loan application?"},
    }
)

print(response["generator"]["replies"][0])

Testing the Integration

Run a smoke test that writes one document, retrieves it through Elasticsearch, and confirms Haystack can ground an answer on it.

test_query = {
    "operator": "AND",
    "conditions": [
        {"field": "meta.product", "operator": "==", "value": "mortgage"},
        {"field": "meta.doc_type", "operator": "=","value":"credit_policy"},
    ],
}

result = retriever.run(query="What is the DTI limit?", filters=test_query)

assert len(result["documents"]) > 0
print("Retrieved:", result["documents"][0].content)

Expected output:

Retrieved: Debt-to-income ratio must not exceed 43% for standard approval.

If you get zero results:

  • check index name consistency
  • confirm metadata field names match exactly
  • verify your documents were written to Elasticsearch before querying

Real-World Use Cases

  • Loan policy assistant

    • Let agents answer questions like “What income proof do we need for this product?”
    • Ground responses in indexed underwriting manuals and compliance docs
  • Case file summarization

    • Retrieve borrower statements, servicing notes, and exception approvals
    • Generate concise case summaries for underwriters or ops teams
  • Compliance lookup

    • Search jurisdiction-specific rules across lending products
    • Filter by state, product type, or policy version before answering

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides