How to Build a policy Q&A Agent Using CrewAI in Python for insurance

By Cyprian AaronsUpdated 2026-04-21

policy-q-acrewaipythoninsurancepolicy-qanda

A policy Q&A agent answers customer or internal staff questions about insurance policies by retrieving the right policy language, summarizing it in plain English, and flagging when a question needs human review. In insurance, that matters because a wrong answer is not just a bad UX issue; it can create compliance risk, claim disputes, and audit headaches.

Architecture

Build this agent with a small set of components that each do one job well:

•
Question intake layer
- •Receives the user’s question plus context like policy ID, product line, jurisdiction, and customer type.
•
Policy retrieval layer
- •Pulls the relevant policy documents, endorsements, exclusions, and state-specific riders from a controlled source.
•
Answering agent
- •Uses an LLM to synthesize an answer from retrieved text only.
•
Compliance guardrail
- •Detects when the question crosses into legal advice, claims adjudication, or unsupported interpretation.
•
Audit logger
- •Stores the question, retrieved passages, final answer, model version, and decision path for later review.
•
Human handoff path
- •Routes ambiguous or high-risk questions to a licensed adjuster, underwriter, or compliance reviewer.

Implementation

1) Install CrewAI and define your inputs

Use CrewAI’s Agent, Task, and Crew classes. For insurance work, keep inputs explicit so you can trace exactly which policy version was used.

pip install crewai crewai-tools

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool

question = "Does this homeowners policy cover water damage from a burst pipe?"
policy_context = {
    "policy_id": "HO-48291",
    "jurisdiction": "CA",
    "product_line": "homeowners",
    "policy_version": "2025-01",
}

2) Create a retrieval tool for policy text

In production you would point this at your document store or vector index. The important part is that the agent does not answer from memory; it answers from retrieved policy language.

from crewai_tools import BaseTool

class PolicyLookupTool(BaseTool):
    name: str = "Policy Lookup Tool"
    description: str = "Fetch relevant policy clauses by policy ID and question"

    def _run(self, policy_id: str, question: str) -> str:
        # Replace with vector DB / document service lookup
        if "burst pipe" in question.lower():
            return (
                "Section 4 - Perils Insured Against: sudden and accidental direct physical loss "
                "caused by discharge of water from plumbing system. "
                "Section 8 - Exclusions: long-term seepage, wear and tear."
            )
        return "No relevant clause found."

3) Define agents with narrow responsibilities

Keep the answering agent constrained. Add a second agent for compliance review so risky outputs get checked before they reach the user.

policy_lookup = PolicyLookupTool()

retriever = Agent(
    role="Policy Retriever",
    goal="Find exact policy language relevant to the user's question.",
    backstory="You retrieve only from approved insurance policy sources.",
    tools=[policy_lookup],
    verbose=True,
)

answerer = Agent(
    role="Policy Q&A Assistant",
    goal="Answer insurance policy questions using retrieved text only.",
    backstory=(
        "You explain coverage clearly without guessing. "
        "If the clause is ambiguous or jurisdiction-specific, escalate."
    ),
    verbose=True,
)

compliance_reviewer = Agent(
    role="Compliance Reviewer",
    goal="Check answers for regulatory risk, unsupported claims, and ambiguity.",
    backstory=(
        "You ensure responses avoid legal advice and include escalation when needed."
    ),
    verbose=True,
)

4) Chain retrieval, drafting, and compliance review

This pattern gives you traceability. The retriever gets the clause first, the answerer drafts from that clause only, then compliance reviews the draft before release.

retrieve_task = Task(
    description=(
        f"Given policy_id={policy_context['policy_id']} and question='{question}', "
        "retrieve the exact policy clauses relevant to answering coverage."
    ),
    expected_output="Relevant clauses quoted verbatim with section names.",
    agent=retriever,
)

draft_task = Task(
    description=(
        "Using only the retrieved clauses below, draft a concise answer for the user. "
        "Do not add assumptions or legal interpretations.\n\n"
        "{retrieved_clauses}"
    ),
    expected_output="A plain-English coverage answer with caveats if needed.",
    agent=answerer,
)

review_task = Task(
    description=(
        "Review this draft for compliance issues. Ensure it does not overstate coverage "
        "or provide legal advice. If risky or ambiguous, mark for human review.\n\n"
        "{draft_answer}"
    ),
    expected_output="Approved answer or escalation note.",
    agent=compliance_reviewer,
)

Then execute the crew:

crew = Crew(
    agents=[retriever, answerer, compliance_reviewer],
    tasks=[retrieve_task, draft_task, review_task],
    process=Process.sequential,
)

result = crew.kickoff(inputs={
    "question": question,
    "policy_id": policy_context["policy_id"],
})

print(result)

For production insurance workflows, I would add structured output parsing around result so you can separate:

•final answer
•confidence / escalation flag
•cited clauses
•audit metadata

Production Considerations

•
Put data residency first
- •Policy docs often contain PII and regulated content. Keep retrieval and inference inside the correct region if your book of business requires it.
•
Log every decision path
- •Store question text, retrieved clauses, final response, model name/version, timestamp, and reviewer outcome. That gives you auditability during complaints or regulator reviews.
•
Add hard guardrails
- •Block answers that drift into legal advice or claims determination. If the model cannot cite exact clauses or sees conflicting endorsements, force escalation.
•
Monitor false certainty
- •Track cases where the assistant answered without strong clause support. In insurance support flows, overconfident wrong answers are worse than “I need human review.”

Common Pitfalls

•
Letting the model answer from general knowledge
- •Fix this by forcing retrieval-first behavior and rejecting answers without cited clauses.
•
Ignoring jurisdiction and endorsement hierarchy
- •A standard form may be overridden by state riders or endorsements. Always pass jurisdiction and effective date into retrieval.
•
Skipping human handoff on ambiguous questions
- •Questions about exclusions, claim eligibility, cancellation rights, or coverage disputes should route to a licensed reviewer when the source text is unclear.

If you build this as retrieval-first plus compliance review plus audit logging, you get something usable in an insurance environment instead of a demo that looks good in testing and fails in production.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit