How to Integrate Haystack for lending with Elasticsearch for production AI
Combining Haystack for lending with Elasticsearch gives you a practical production stack for loan document search, retrieval, and agent-assisted decision support. Haystack handles the orchestration layer for RAG-style workflows, while Elasticsearch gives you durable full-text search, filters, and fast retrieval over underwriting docs, policy PDFs, customer records, and loan application histories.
Prerequisites
- •Python 3.10+
- •An Elasticsearch cluster running locally or in your environment
- •An Elasticsearch index with documents already loaded
- •Access to your Haystack for lending package and its retriever components
- •
pipinstalled - •Environment variables set for Elasticsearch credentials if you are not using local dev mode
Install the Python dependencies:
pip install haystack-ai elasticsearch
If you are using a managed Elasticsearch deployment, make sure you have:
- •
ELASTICSEARCH_URL - •
ELASTICSEARCH_API_KEYor username/password - •Network access from your app runtime to the cluster
Integration Steps
- •Connect to Elasticsearch and define the index
Start by creating a client and confirming the target index exists. In production, keep this separate from your app startup logic so connection failures are explicit.
from elasticsearch import Elasticsearch
es = Elasticsearch(
"http://localhost:9200",
basic_auth=("elastic", "changeme"),
)
index_name = "loan-documents"
if not es.indices.exists(index=index_name):
es.indices.create(
index=index_name,
mappings={
"properties": {
"content": {"type": "text"},
"doc_type": {"type": "keyword"},
"borrower_id": {"type": "keyword"},
"created_at": {"type": "date"}
}
}
)
print(es.info()["version"]["number"])
- •Load loan content into Elasticsearch
For lending workflows, your source data is usually PDF extractions, KYC notes, underwriting memos, or policy clauses. Store them as searchable documents with metadata that agents can filter on later.
from datetime import datetime
docs = [
{
"_index": index_name,
"_id": "loan-001",
"_source": {
"content": "Borrower has 18 months of business history and stable cash flow.",
"doc_type": "underwriting_note",
"borrower_id": "B1001",
"created_at": datetime.utcnow().isoformat()
}
},
{
"_index": index_name,
"_id": "loan-002",
"_source": {
"content": "Collateral includes commercial equipment valued at 250000 USD.",
"doc_type": "collateral_report",
"borrower_id": "B1001",
"created_at": datetime.utcnow().isoformat()
}
}
]
for doc in docs:
es.index(**doc)
es.indices.refresh(index=index_name)
- •Wire Haystack to Elasticsearch for retrieval
Haystack’s ElasticsearchDocumentStore is the bridge here. It lets Haystack read from and write to Elasticsearch without you hand-rolling search logic.
from haystack import Document
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(
hosts=["http://localhost:9200"],
basic_auth=("elastic", "changeme"),
index=index_name,
embedding_dim=384,
)
haystack_docs = [
Document(
content="Borrower has 18 months of business history and stable cash flow.",
meta={"doc_type": "underwriting_note", "borrower_id": "B1001"}
),
Document(
content="Collateral includes commercial equipment valued at 250000 USD.",
meta={"doc_type": "collateral_report", "borrower_id": "B1001"}
),
]
document_store.write_documents(haystack_docs)
- •Build a retriever-backed query path
For production AI agents, use a retriever so the model gets grounded context instead of guessing. The exact retriever class depends on whether you want keyword search or vector search; for lending systems, hybrid retrieval is usually the right default.
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
# If you're using Haystack's newer pipeline APIs with a document store-backed setup,
# keep retrieval logic explicit and testable.
retriever = InMemoryBM25Retriever(document_store=document_store)
query = "What collateral does borrower B1001 have?"
results = retriever.run(query=query)
for doc in results["documents"]:
print(doc.content)
If you need direct Elasticsearch query control for compliance filters, use the native client alongside Haystack:
response = es.search(
index=index_name,
query={
"bool": {
"must": [{"match": {"content": "collateral"}}],
"filter": [{"term": {"borrower_id": "B1001"}}]
}
}
)
for hit in response["hits"]["hits"]:
print(hit["_source"]["content"])
- •Put both together in an agent workflow
In production, your agent should first retrieve evidence from Elasticsearch through Haystack, then pass only relevant chunks into the LLM layer. That keeps answers auditable and reduces hallucinations.
from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
template = """
Answer the lending question using only these documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Question: {{ question }}
"""
prompt_builder = PromptBuilder(template=template)
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.connect("retriever.documents", "prompt_builder.documents")
result = pipeline.run({
"retriever": {"query": query},
"prompt_builder": {"question": query}
})
print(result["prompt_builder"]["prompt"])
Testing the Integration
Run a simple end-to-end check: write one lending document into Elasticsearch, retrieve it through Haystack, and verify the content comes back.
test_query = "business history stable cash flow"
search_result = es.search(
index=index_name,
query={"match": {"content": test_query}}
)
assert search_result["hits"]["total"]["value"] > 0
hit_text = search_result["hits"]["hits"][0]["_source"]["content"]
print("Elasticsearch hit:", hit_text)
retrieved = retriever.run(query=test_query)["documents"]
print("Haystack hit:", retrieved[0].content)
Expected output:
Elasticsearch hit: Borrower has 18 months of business history and stable cash flow.
Haystack hit: Borrower has 18 months of business history and stable cash flow.
Real-World Use Cases
- •Underwriting copilots that answer questions like “show all adverse findings for this borrower” with citations pulled from indexed loan files.
- •Loan policy assistants that retrieve relevant clauses from credit policy manuals before generating recommendations.
- •Servicing support agents that search payment histories, collateral notes, and exception logs to help ops teams resolve cases faster.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit