How to Integrate Haystack for lending with Elasticsearch for startups

By Cyprian AaronsUpdated 2026-04-21

haystack-for-lendingelasticsearchstartups

Combining Haystack for lending with Elasticsearch gives you a practical retrieval layer for loan workflows: borrower documents, policy docs, underwriting notes, and transaction history can all be indexed and queried from one place. In an AI agent system, that means your agent can answer lending questions with grounded context instead of guessing.

For startups, this is the right shape of integration: Haystack handles the orchestration and retrieval pipeline, while Elasticsearch gives you fast full-text search, filtering, and scalable indexing.

Prerequisites

•Python 3.10+
•An Elasticsearch cluster running locally or in the cloud
•A Haystack installation with the Elasticsearch integration package
•
Access to your lending data sources:
- •PDF loan applications
- •KYC/AML notes
- •credit policy documents
- •customer support transcripts
•
Environment variables configured:
- •ELASTICSEARCH_URL
- •ELASTICSEARCH_USERNAME
- •ELASTICSEARCH_PASSWORD

Install the packages:

pip install haystack-ai elasticsearch-haystack elasticsearch

Integration Steps

•Set up your Elasticsearch connection.

Use the official client first so you can validate connectivity before wiring Haystack into the pipeline.

import os
from elasticsearch import Elasticsearch

es = Elasticsearch(
    os.environ["ELASTICSEARCH_URL"],
    basic_auth=(
        os.environ["ELASTICSEARCH_USERNAME"],
        os.environ["ELASTICSEARCH_PASSWORD"],
    ),
)

print(es.info())

If this fails, fix auth or network access before moving on.

•Create a document store in Haystack backed by Elasticsearch.

Haystack’s ElasticsearchDocumentStore is the bridge between your lending content and search index.

from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    hosts=os.environ["ELASTICSEARCH_URL"],
    basic_auth=(
        os.environ["ELASTICSEARCH_USERNAME"],
        os.environ["ELASTICSEARCH_PASSWORD"],
    ),
    index="lending-documents",
    embedding_dim=384,
)

For startup systems, keep the index name explicit per domain. Don’t mix underwriting docs with support tickets unless you want noisy retrieval.

•Convert lending files into Haystack documents and write them to Elasticsearch.

In production, your ingestion layer will usually parse PDFs or HTML into text chunks first. Here’s the core pattern using Document and write_documents().

from haystack import Document

documents = [
    Document(
        content="Applicant income verification shows stable monthly deposits for 18 months.",
        meta={"doc_type": "income_verification", "loan_id": "LN-1001"}
    ),
    Document(
        content="Policy requires debt-to-income ratio below 43 percent for unsecured personal loans.",
        meta={"doc_type": "policy", "department": "underwriting"}
    ),
]

document_store.write_documents(documents)
print("Indexed:", len(documents))

If you already have chunked text from OCR or parsing, keep each chunk small enough for precise retrieval. That usually means 200 to 500 tokens per chunk.

•Add retrieval so your AI agent can query lending knowledge.

Use Haystack’s retriever to fetch relevant context from Elasticsearch before generation or decision support.

from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever

retriever = ElasticsearchBM25Retriever(document_store=document_store)

results = retriever.run(query="What is the DTI policy for unsecured personal loans?")
for item in results["documents"]:
    print(item.content)
    print(item.meta)

This is the simplest production-safe path when you need lexical search over policy language, borrower notes, and compliance text. For many lending workflows, BM25 outperforms naive vector-only setups because exact terms matter.

•Wire retrieval into an agent pipeline.

If you’re building an AI agent, feed retrieved context into your prompt builder or generator component. The key is to keep retrieval separate from generation so auditability stays intact.

from haystack import Pipeline

pipeline = Pipeline()
pipeline.add_component("retriever", retriever)

query = "Can we approve LN-1001 based on current policy?"
response = pipeline.run({
    "retriever": {
        "query": query,
        "top_k": 3,
    }
})

print(response)

In a real lending agent, this output becomes structured context passed to a rules engine or LLM prompt. Don’t let the model invent approval logic; let it summarize retrieved evidence.

Testing the Integration

Run a quick end-to-end check: write one document, retrieve it back, and confirm metadata is preserved.

from haystack import Document
from haystack_integrations.components.retrievers.elasticsearch import ElasticsearchBM25Retriever

test_doc = Document(
    content="Loan application LN-9009 has verified employment and no missed payments in the last 24 months.",
    meta={"loan_id": "LN-9009", "status": "verified"},
)

document_store.write_documents([test_doc])

retriever = ElasticsearchBM25Retriever(document_store=document_store)
result = retriever.run(query="verified employment missed payments")

for doc in result["documents"][:1]:
    print(doc.content)
    print(doc.meta)

Expected output:

Loan application LN-9009 has verified employment and no missed payments in the last 24 months.
{'loan_id': 'LN-9009', 'status': 'verified'}

If you get empty results:

•confirm the index name matches
•check that documents were actually written
•verify analyzer settings if your queries are too strict

Real-World Use Cases

•
Loan officer copilot
- •Retrieve policy clauses, borrower history, and supporting docs while drafting approval notes.
- •Useful when underwriters need evidence-backed summaries fast.
•
Compliance Q&A agent
- •Search KYC/AML procedures and audit trails with exact-term matching.
- •Good fit for internal assistants that must cite source text.
•
Application triage workflow
- •Classify incoming loan applications by searching for missing fields or risk indicators.
- •Route cases to manual review when retrieval surfaces conflicting documents.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit