How to Build a underwriting Agent Using LlamaIndex in Python for insurance

By Cyprian AaronsUpdated 2026-04-21

underwritingllamaindexpythoninsurance

An underwriting agent takes an insurance submission, pulls the right policy rules and risk documents, evaluates the case against underwriting guidelines, and returns a recommendation with evidence. That matters because underwriters spend too much time on document triage and policy lookup; a well-built agent reduces manual review while keeping decisions auditable, consistent, and compliant.

Architecture

•
Submission intake layer
- •Accepts broker emails, ACORD forms, loss runs, SOVs, and supplemental docs.
- •Normalizes files into text chunks for retrieval.
•
Policy and guideline index
- •Stores underwriting manuals, appetite guides, exclusions, referral thresholds, and state-specific rules.
- •Built with VectorStoreIndex for semantic retrieval.
•
Risk extraction layer
- •Uses an LLM to extract structured fields like NAICS code, limits requested, prior losses, occupancy type, and jurisdiction.
- •Feeds the decision engine with clean inputs.
•
Retrieval + reasoning layer
- •Uses RetrieverQueryEngine or an agent workflow to ground decisions in retrieved policy text.
- •Produces recommendation plus citations.
•
Audit trail store
- •Persists inputs, retrieved chunks, model output, and final decision.
- •Required for compliance review and adverse action traceability.
•
Guardrails layer
- •Blocks unsupported recommendations, PII leakage, and out-of-scope cases.
- •Forces human referral when confidence or rules are below threshold.

Implementation

1) Install dependencies and load underwriting documents

Use LlamaIndex for document ingestion and indexing. In production you will usually load from SharePoint, S3, or a document management system; here we use local files to keep the pattern concrete.

from llama_index.core import SimpleDirectoryReader

# Load underwriting guidelines, appetite docs, and referral rules
documents = SimpleDirectoryReader(
    input_dir="./underwriting_docs",
    recursive=True
).load_data()

print(f"Loaded {len(documents)} documents")

If your documents include scanned PDFs, run OCR upstream before indexing. Underwriting quality depends on clean text extraction more than clever prompting.

2) Build a policy index with `VectorStoreIndex`

This is the retrieval backbone. The agent should answer from approved underwriting material only; do not let it freewheel across arbitrary internet context.

from llama_index.core import VectorStoreIndex
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI

# Configure the LLM used by LlamaIndex
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

# Build the underwriting knowledge index
index = VectorStoreIndex.from_documents(documents)

# Create a retriever for grounded lookup
retriever = index.as_retriever(similarity_top_k=4)

For insurance use cases, keep the corpus scoped:

•underwriting guidelines by line of business
•state filings and endorsements
•referral authority matrix
•exclusions and appetite exceptions

Do not mix claims notes or customer service transcripts into the same index unless you have strict access controls.

3) Create a query engine that returns recommendations with citations

A practical pattern is: retrieve evidence first, then ask the LLM to produce a recommendation. RetrieverQueryEngine gives you grounded answers without building a full custom agent loop on day one.

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer

response_synthesizer = get_response_synthesizer(response_mode="compact")
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

submission = """
Risk summary:
- Commercial property in Texas
- Frame construction
- Fire protection class: 8
- Prior losses: two water damage claims in last 24 months
- Requested limit: $5M
- Occupancy: light manufacturing

Question: Is this within appetite or should it be referred?
"""

response = query_engine.query(submission)
print(response)

That gets you grounded retrieval. For production underwriting you usually want structured output too, so wrap this with a schema extractor before scoring the risk.

4) Add structured extraction before decisioning

Use PydanticProgramExtractor or an LLM prompt that extracts fields into a schema. Then apply deterministic rules outside the model. That keeps referral logic explainable.

from pydantic import BaseModel, Field
from typing import Optional

class UnderwritingSubmission(BaseModel):
    line_of_business: str = Field(...)
    state: str = Field(...)
    occupancy: str = Field(...)
    construction_type: Optional[str] = None
    requested_limit: float = Field(...)
    prior_losses_24m: int = Field(...)
    referral_required: bool = Field(...)

extract_prompt = """
Extract underwriting fields from this submission text.
Return only valid JSON matching the schema.
"""

# Example downstream rule check after extraction:
def requires_referral(data: UnderwritingSubmission) -> bool:
    if data.state == "TX" and data.requested_limit > 3000000:
        return True
    if data.prior_losses_24m >= 2:
        return True
    return False

In practice you would feed extracted fields plus retrieved guideline snippets into your final decision prompt. The model should explain why it recommended bind/refer/decline using citations from the index.

Production Considerations

•
Keep data residency explicit
- •If you operate in regulated markets, pin storage and model endpoints to approved regions.
- •Separate EU/UK/US corpora if your compliance team requires it.
•
Log every decision path
- •Store submission hash, retrieved node IDs, model version, prompt version, and final recommendation.
- •Underwriters need traceability for audits and disputes.
•
Add hard guardrails before the LLM output reaches users
- •Enforce deterministic rules for authority limits, prohibited classes, sanctions screening hits, and minimum documentation requirements.
- •Route anything ambiguous to human review.
•
Monitor drift by line of business
- •Track bind rate, referral rate, override rate by state/class/agent/broker.
- •A spike in overrides usually means your guidelines changed or your retrieval corpus is stale.

Common Pitfalls

•
Letting the model decide without grounding
- •Mistake: asking an LLM “should we bind this risk?” with no policy context.
- •Fix: always retrieve from approved underwriting docs using VectorStoreIndex + RetrieverQueryEngine.
•
Encoding business rules only in prompts
- •Mistake: putting referral thresholds inside natural language instructions.
- •Fix: implement hard checks in Python after structured extraction. Prompts are not control systems.
•
Ignoring auditability
- •Mistake: returning a recommendation without storing evidence.
- •Fix: persist retrieved chunks, prompt versioning, extracted fields, and final output for every case.

A good underwriting agent does not replace the underwriter’s judgment. It handles document-heavy triage fast enough to matter while keeping every recommendation tied back to policy text, rule logic, and an audit trail you can defend in front of compliance.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit