How to Build a policy Q&A Agent Using LangGraph in Python for insurance

By Cyprian AaronsUpdated 2026-04-21

policy-q-alanggraphpythoninsurancepolicy-qanda

A policy Q&A agent answers questions about coverage, exclusions, limits, deductibles, endorsements, and claims handling by retrieving the right policy text and turning it into a controlled answer. For insurance teams, that matters because policy language is dense, customer-facing accuracy is non-negotiable, and every response needs an audit trail.

Architecture

•
Chat input layer
- •Accepts the user question and session metadata like policy ID, jurisdiction, and line of business.
•
Policy retrieval tool
- •Pulls the relevant policy sections from a document store or vector index.
- •For insurance, this should be scoped to the customer’s exact policy version, not a generic product brochure.
•
State machine with LangGraph
- •Orchestrates the flow: classify question, retrieve context, generate answer, validate output.
- •This is where StateGraph, add_node, add_edge, and compile do the real work.
•
LLM answer generator
- •Produces a grounded answer using only retrieved policy text.
- •Should be constrained to avoid hallucinating exclusions or coverages.
•
Compliance and safety layer
- •Checks for unsupported claims, legal advice leakage, missing citations, and jurisdiction issues.
•
Audit logging
- •Stores question, retrieved chunks, model output, policy version, timestamps, and approval path for later review.

Implementation

1) Define the graph state and nodes

You want a small state object that carries the question, retrieved policy text, draft answer, and any validation flags. Keep it explicit so you can log every step later.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

class PolicyQAState(TypedDict):
    messages: Annotated[list, add_messages]
    policy_text: str
    draft_answer: str
    needs_review: bool

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def retrieve_policy(state: PolicyQAState) -> dict:
    question = state["messages"][-1].content
    # Replace with vector search / document lookup filtered by policy_id + jurisdiction
    policy_text = (
        "Section 2.1: Water damage is covered if sudden and accidental. "
        "Section 4.3: Damage from flooding is excluded unless endorsed."
    )
    return {"policy_text": policy_text}

def generate_answer(state: PolicyQAState) -> dict:
    prompt = f"""
You are answering an insurance policy question.
Use only the policy text below. If the answer is not supported, say so.

Policy text:
{state['policy_text']}

Question:
{state['messages'][-1].content}
"""
    response = llm.invoke([HumanMessage(content=prompt)])
    return {"draft_answer": response.content}

2) Add a compliance check node

Insurance answers need more than “sounds right.” You need a deterministic gate for unsupported statements and risky phrasing like “guaranteed coverage” or “legal advice.”

def compliance_check(state: PolicyQAState) -> dict:
    answer = state["draft_answer"].lower()
    risky_terms = ["guaranteed", "always covered", "legal advice", "will be approved"]
    unsupported = any(term in answer for term in risky_terms)
    too_short = len(state["draft_answer"].strip()) < 20
    return {"needs_review": unsupported or too_short}

3) Wire the workflow with `StateGraph`

This is the core LangGraph pattern. The graph retrieves context first, then generates an answer, then routes to review or final output.

def route_after_check(state: PolicyQAState) -> str:
    return "review" if state["needs_review"] else "final"

def human_review(state: PolicyQAState) -> dict:
    # In production this could create a task in your case management system
    reviewed = (
        "This response requires human review because it may contain unsupported wording."
    )
    return {"draft_answer": reviewed}

graph = StateGraph(PolicyQAState)

graph.add_node("retrieve_policy", retrieve_policy)
graph.add_node("generate_answer", generate_answer)
graph.add_node("compliance_check", compliance_check)
graph.add_node("human_review", human_review)

graph.add_edge(START, "retrieve_policy")
graph.add_edge("retrieve_policy", "generate_answer")
graph.add_edge("generate_answer", "compliance_check")

graph.add_conditional_edges(
    "compliance_check",
    route_after_check,
    {
        "review": "human_review",
        "final": END,
    },
)

graph.add_edge("human_review", END)

app = graph.compile()

4) Run it with an insurance-specific query

Pass metadata alongside the user message in your real app. In production you would include policy ID, tenant ID, jurisdiction, and document version in retrieval filters.

result = app.invoke(
    {
        "messages": [HumanMessage(content="Is water damage from a burst pipe covered?")],
        "policy_text": "",
        "draft_answer": "",
        "needs_review": False,
    }
)

print(result["draft_answer"])

Production Considerations

•
Lock retrieval to the insured’s exact policy version
- •A homeowner in Texas may have different endorsements than one in California.
- •Filter by policy_id, effective_date, jurisdiction, and line of business before you retrieve anything.
•
Keep audit logs immutable
- •Store user question, retrieved passages, final answer, model version, prompt template hash, and reviewer decisions.
- •This helps with complaints handling, regulator requests, and internal QA.
•
Add guardrails before final output
- •Block answers that mention unverified coverage guarantees.
- •Require citations to specific clauses or mark the response as needing human review.
•
Respect data residency
- •If policies or claims data must stay in-region, pin your vector store and model endpoint accordingly.
- •Do not route EU customer data through a US-only inference path unless your legal team has signed off.

Common Pitfalls

•
Using one generic prompt for every policy
- •That produces vague answers and wrong assumptions.
- •Fix it by injecting product line context and retrieving only from the correct policy corpus.
•
Skipping conditional routing
- •If every answer goes straight to the user, you will ship hallucinations into regulated workflows.
- •Use add_conditional_edges to send uncertain outputs to human review.
•
Treating compliance as a post-processing regex
- •Regex checks catch obvious bad phrases but miss unsupported coverage claims.
- •Combine deterministic rules with retrieval grounding and reviewer escalation.
•
Ignoring traceability
- •If you cannot show which clause supported an answer, you will struggle during disputes.
- •Persist the retrieved chunks and graph state transitions for every conversation.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit