How to Build a underwriting Agent Using LlamaIndex in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
underwritingllamaindexpythonretail-banking

A underwriting agent in retail banking helps case workers and credit teams gather customer data, check policy rules, summarize risk, and produce a decision-ready recommendation. It matters because loan decisions need to be fast, consistent, auditable, and compliant with internal policy and regulations like fair lending and data residency rules.

Architecture

  • Document ingestion layer
    • Pulls KYC docs, payslips, bank statements, bureau reports, and product policy PDFs into a controlled index.
  • Retrieval layer
    • Uses VectorStoreIndex and metadata filters to fetch the right policy clauses, underwriting rules, and historical cases.
  • Decision orchestration layer
    • A workflow or service wrapper that turns retrieved evidence into a structured underwriting recommendation.
  • Policy guardrail layer
    • Enforces hard rules like minimum income thresholds, DTI caps, excluded geographies, and required disclosures.
  • Audit and traceability layer
    • Stores prompts, retrieved chunks, model outputs, timestamps, and final decisions for compliance review.
  • Human-in-the-loop review layer
    • Routes borderline or high-risk cases to an underwriter instead of auto-deciding.

Implementation

1) Install dependencies and define your data model

Use LlamaIndex for retrieval plus a lightweight schema for the underwriting output. Keep the decision object structured from day one; banks need deterministic fields for audit and downstream systems.

from pydantic import BaseModel
from typing import List

class UnderwritingDecision(BaseModel):
    approved: bool
    risk_band: str
    reasons: List[str]
    required_actions: List[str]

Install the core packages:

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

2) Build an index over policy documents and past cases

For retail banking, separate policy content from customer evidence. You usually want policy docs in one index and customer files in another namespace or store so you can enforce access controls and residency rules.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

policy_docs = SimpleDirectoryReader("./data/policies").load_data()
case_docs = SimpleDirectoryReader("./data/cases").load_data()

policy_index = VectorStoreIndex.from_documents(policy_docs)
case_index = VectorStoreIndex.from_documents(case_docs)

policy_retriever = policy_index.as_retriever(similarity_top_k=3)
case_retriever = case_index.as_retriever(similarity_top_k=3)

This is the core pattern: retrieve policy evidence first, then compare the applicant file against it. In underwriting, that order matters because you want the model grounded in bank rules before it reasons about the customer.

3) Create an underwriting prompt with strict output constraints

Don’t ask for free-form prose. Ask for a structured decision with explicit reasons tied to retrieved evidence. That makes it easier to audit and easier to reject invalid outputs.

from llama_index.core.prompts import PromptTemplate

template = PromptTemplate(
    """You are a retail banking underwriting assistant.
Use only the provided context.

Policy context:
{policy_context}

Applicant context:
{applicant_context}

Return a JSON-like decision with:
- approved: true/false
- risk_band: low/medium/high
- reasons: list of short bullets
- required_actions: list of short bullets

Rules:
- If income is missing or debt-to-income exceeds policy limits, do not approve.
- If any required document is missing, require manual review.
- Do not mention unsupported facts.
"""
)

4) Run retrieval + synthesis + validation

Here’s the actual agent flow. It retrieves relevant chunks from both indexes, composes them into a prompt, gets a structured answer back from the LLM, then validates it before returning anything to your loan origination system.

def build_context(nodes):
    return "\n\n".join([n.get_content() for n in nodes])

def underwrite(applicant_text: str) -> UnderwritingDecision:
    policy_nodes = policy_retriever.retrieve(applicant_text)
    applicant_nodes = case_retriever.retrieve(applicant_text)

    policy_context = build_context(policy_nodes)
    applicant_context = build_context(applicant_nodes)

    prompt = template.format(
        policy_context=policy_context,
        applicant_context=applicant_context,
    )

    response = Settings.llm.complete(prompt).text

    # Replace this with a stricter parser in production.
    if "approved: true" in response.lower():
        return UnderwritingDecision(
            approved=True,
            risk_band="medium",
            reasons=["Policy criteria met based on retrieved evidence"],
            required_actions=[]
        )

    return UnderwritingDecision(
        approved=False,
        risk_band="high",
        reasons=["One or more underwriting conditions were not satisfied"],
        required_actions=["Manual review required"]
    )

decision = underwrite(
    "Applicant has stable employment for 4 years, monthly income of 4500 USD,"
    " existing debt obligations of 1200 USD, requesting personal loan."
)

print(decision.model_dump())

For production use Pydantic parsing or StructuredLLM patterns instead of string matching. The example above shows the end-to-end shape; your bank system should reject malformed outputs and never trust raw text.

Production Considerations

  • Deployment isolation
    • Keep customer PII inside your approved network boundary. If your bank has regional residency requirements, deploy embeddings storage and vector search in-region too.
  • Monitoring
    • Track retrieval hit rate, empty-context rate, manual-review rate, false approvals, and policy override frequency.
    • Log every retrieved chunk ID plus model version so compliance can reconstruct decisions later.
  • Guardrails
    • Add hard checks before model output is accepted: missing documents, DTI thresholds, adverse bureau flags, age-of-income-statement limits.
    • Never let the LLM override deterministic policy rules.
  • Human review routing
    • Auto-decide only low-risk files with complete data.
    • Send thin-file applicants, exceptions, or contradictory documents to an underwriter queue.

Common Pitfalls

  1. Mixing policy text with customer data in one undifferentiated index

    • This creates leakage risk and weakens access control.
    • Keep separate indexes or namespaces so retrieval can be permissioned properly.
  2. Letting the model make final credit decisions without rule checks

    • LLMs are good at summarizing evidence; they are not your credit policy engine.
    • Run deterministic validation first, then use the model for explanation and recommendation.
  3. Skipping audit artifacts

    • If you cannot reproduce why a loan was approved or declined, you will fail internal review fast.
    • Store prompt text, retrieved node IDs, timestamps, model name/version, and final decision payload.
  4. Ignoring residency and retention constraints

    • Banking data often cannot leave specific jurisdictions.
    • Make sure embeddings providers, vector stores, logs, and backups all comply with local data handling rules.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides