How to Build a underwriting Agent Using LangChain in Python for investment banking

By Cyprian AaronsUpdated 2026-04-21

underwritinglangchainpythoninvestment-banking

An underwriting agent in investment banking takes a deal package, extracts the key facts, checks them against policy and risk rules, and drafts a recommendation for a human underwriter or credit committee. It matters because most underwriting work is document-heavy, time-sensitive, and highly repetitive; the agent reduces manual triage while keeping compliance, auditability, and decision quality intact.

Architecture

•
Document ingestion layer
- •Pulls PDFs, term sheets, financial statements, KYC files, and sponsor decks from approved storage.
- •Uses PyPDFLoader, UnstructuredFileLoader, or internal parsers depending on document type.
•
Normalization and extraction layer
- •Converts messy deal docs into structured fields like borrower name, facility size, leverage ratio, covenants, maturity, and jurisdiction.
- •Uses LangChain chat models with structured outputs via PydanticOutputParser or tool/function calling.
•
Policy and risk rules layer
- •Applies bank-specific underwriting policy: sector exclusions, concentration limits, minimum DSCR, leverage thresholds, sanctions checks.
- •Keeps deterministic logic outside the LLM.
•
Retrieval layer
- •Retrieves internal credit policy excerpts, prior committee memos, and approved precedent transactions.
- •Uses FAISS or another vector store through LangChain retrievers.
•
Decision drafting layer
- •Produces a memo-style recommendation with rationale, exceptions, and required approvals.
- •Uses LLMChain or LCEL pipelines with a controlled prompt.
•
Audit and logging layer
- •Stores inputs, retrieved sources, model outputs, and rule evaluations for review.
- •Critical for model governance and post-trade / post-deal audit trails.

Implementation

1. Install dependencies and define the data model

For underwriting you want structured output first. If the model cannot emit valid fields consistently, the rest of the pipeline becomes brittle.

from typing import List
from pydantic import BaseModel, Field

class UnderwritingSummary(BaseModel):
    borrower: str = Field(..., description="Legal entity name")
    facility_amount_usd: float = Field(..., description="Requested facility amount in USD")
    industry: str = Field(..., description="Borrower industry")
    jurisdiction: str = Field(..., description="Primary legal jurisdiction")
    leverage_ratio: float = Field(..., description="Net debt / EBITDA")
    dscr: float = Field(..., description="Debt service coverage ratio")
    risks: List[str] = Field(default_factory=list)
    recommendation: str = Field(..., description="Approve / Approve with conditions / Reject")

2. Load documents and extract facts with LangChain

This example uses real LangChain classes: ChatOpenAI, PromptTemplate, PydanticOutputParser, and create_stuff_documents_chain. The pattern is to retrieve the relevant text first, then force structured extraction.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

loader = PyPDFLoader("deal_package.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=150)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

parser = PydanticOutputParser(pydantic_object=UnderwritingSummary)

prompt = PromptTemplate(
    template="""
You are an underwriting analyst for an investment bank.
Extract only facts supported by the context.

Context:
{context}

Return JSON matching this schema:
{format_instructions}
""",
    input_variables=["context"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def extract_summary(query: str) -> UnderwritingSummary:
    docs = retriever.get_relevant_documents(query)
    context = "\n\n".join(d.page_content for d in docs)
    chain = prompt | llm | parser
    return chain.invoke({"context": context})

summary = extract_summary("Summarize borrower financial metrics and facility terms.")
print(summary.model_dump())

3. Add deterministic underwriting rules

Do not ask the LLM to decide policy compliance on its own. Use code for thresholds and reserve the model for interpretation and drafting.

def apply_policy(summary: UnderwritingSummary) -> list[str]:
    exceptions = []

    if summary.facility_amount_usd > 500_000_000:
        exceptions.append("Facility exceeds standard approval threshold")

    if summary.leverage_ratio > 5.0:
        exceptions.append("Leverage ratio above policy limit")

    if summary.dscr < 1.25:
        exceptions.append("DSCR below minimum covenant requirement")

    high_risk_industries = {"crypto", "gambling", "adult entertainment"}
    if summary.industry.lower() in high_risk_industries:
        exceptions.append("Restricted industry requires enhanced review")

    return exceptions

4. Draft the underwriting memo with sources attached

The final step is a controlled generation step that cites extracted facts and policy exceptions. This keeps the output usable for bankers and reviewers.

from langchain_core.prompts import ChatPromptTemplate

memo_prompt = ChatPromptTemplate.from_messages([
    ("system", "You draft underwriting memos for an investment bank. Be precise."),
    ("user", """
Borrower: {borrower}
Facility Amount (USD): {facility_amount_usd}
Industry: {industry}
Jurisdiction: {jurisdiction}
Leverage Ratio: {leverage_ratio}
DSCR: {dscr}
Risks: {risks}
Policy Exceptions: {exceptions}

Write a short underwriting recommendation with rationale.
""")
])

memo_chain = memo_prompt | llm

def generate_memo(summary: UnderwritingSummary) -> str:
    exceptions = apply_policy(summary)
    result = memo_chain.invoke({
        "borrower": summary.borrower,
        "facility_amount_usd": summary.facility_amount_usd,
        "industry": summary.industry,
        "jurisdiction": summary.jurisdiction,
        "leverage_ratio": summary.leverage_ratio,
        "dscr": summary.dscr,
        "risks": ", ".join(summary.risks),
        "exceptions": ", ".join(exceptions) if exceptions else "None"
    })
    return result.content

print(generate_memo(summary))

Production Considerations

•
Keep sensitive data inside your approved boundary
- •For investment banking this means data residency matters.
- •Use region-locked infrastructure, private networking to model endpoints where possible, and avoid sending raw deal docs to unmanaged services.
•
Log every decision path
- •Store retrieved document IDs, extracted fields, policy checks passed/failed, prompt version, model version, and final recommendation.
- •Auditors will ask why a facility was flagged or approved; you need traceability down to source paragraphs.
•
Use guardrails around output scope
- •Restrict the agent to summarization and recommendation drafting.
- •Do not let it execute trades, alter records in core systems without approval flow, or infer missing financials without explicit disclosure.
•
Set human approval gates
- •Any exception case should route to a banker or credit officer.
- •High-value facilities, restricted sectors, sanctions-adjacent jurisdictions, or missing KYC should never auto-approve.

Common Pitfalls

•
Letting the LLM make policy decisions
- •Wrong pattern: “decide if this loan is approvable.”
- •Better pattern: use code for policy rules and LLMs only for extraction and narrative synthesis.
•
Skipping provenance
- •If you do not attach source documents or chunk IDs to each extracted field you will not survive model risk review.
- •Always keep citations at least at document-chunk level.
•
Using one generic prompt for all deal types
- •Leveraged finance, project finance, CRE lending, and acquisition finance have different metrics.
- •Build separate templates or schemas per product so your agent does not confuse DSCR with leverage or ignore sponsor support terms.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit