How to Integrate OpenAI for insurance with Pinecone for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21

openai-for-insurancepineconemulti-agent-systems

Combining OpenAI for insurance with Pinecone gives you a clean pattern for building agent systems that can answer policy questions, retrieve claim context, and keep multiple specialized agents on the same source of truth. The OpenAI side handles reasoning and response generation, while Pinecone gives your agents low-latency semantic retrieval over policy docs, claims notes, underwriting guidelines, and customer interactions.

Prerequisites

•Python 3.10+
•An OpenAI API key
•A Pinecone API key
•
Access to your insurance knowledge base:
- •policy PDFs or text exports
- •underwriting rules
- •claims procedures
- •FAQ content
•
Installed packages:
- •openai
- •pinecone
- •tiktoken or your preferred chunking/tokenization library
•
Environment variables set:
- •OPENAI_API_KEY
- •PINECONE_API_KEY
- •PINECONE_INDEX_NAME

Install dependencies:

pip install openai pinecone tiktoken

Integration Steps

1) Initialize both clients

Start by creating the OpenAI client for generation and the Pinecone client for retrieval. In a multi-agent system, this usually lives in a shared infrastructure module so every agent uses the same index and model settings.

import os
from openai import OpenAI
from pinecone import Pinecone

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index_name = os.environ["PINECONE_INDEX_NAME"]
index = pc.Index(index_name)

If you are using separate agents for claims, underwriting, and customer service, keep this initialization in a shared package. That avoids drift in embeddings, model versions, and index names.

2) Chunk insurance documents and create embeddings

You need to turn policy content into searchable vectors before Pinecone can retrieve anything useful. Use OpenAI embeddings for consistent semantic matching across all agents.

from typing import List

def chunk_text(text: str, chunk_size: int = 800) -> List[str]:
    words = text.split()
    return [
        " ".join(words[i:i + chunk_size])
        for i in range(0, len(words), chunk_size)
    ]

policy_text = """
Coverage applies to accidental water damage caused by burst pipes.
Exclusions include gradual leaks, mold-related damage, and wear-and-tear.
Claims must be filed within 30 days of discovery.
"""

chunks = chunk_text(policy_text)

embeddings_response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=chunks
)

vectors = []
for i, item in enumerate(embeddings_response.data):
    vectors.append({
        "id": f"policy-chunk-{i}",
        "values": item.embedding,
        "metadata": {
            "source": "policy_guide",
            "chunk": chunks[i]
        }
    })

For insurance use cases, metadata matters as much as the vector. Store document type, product line, jurisdiction, effective date, and claim category so downstream agents can filter correctly.

3) Upsert vectors into Pinecone

Once you have embeddings, push them into your Pinecone index. This is the retrieval layer your agents will query later.

upsert_response = index.upsert(vectors=vectors)

print(upsert_response)

For production systems, batch upserts by document type or line of business.

Pattern	When to use	Why
Single index	Small-to-medium insurance knowledge base	Simpler ops
Namespace per tenant	Multi-carrier or multi-region deployments	Isolation
Namespace per agent type	Claims vs underwriting vs servicing	Cleaner retrieval boundaries

A good default is one index with namespaces like claims, underwriting, and customer_service.

4) Build the retrieval + generation flow

This is where the integration becomes useful. The agent asks a question, Pinecone returns relevant chunks, and OpenAI turns that context into an answer grounded in policy text.

question = "Does this policy cover water damage from a burst pipe?"

question_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=[question]
).data[0].embedding

search_results = index.query(
    vector=question_embedding,
    top_k=3,
    include_metadata=True
)

context_blocks = [
    match["metadata"]["chunk"]
    for match in search_results["matches"]
]

prompt = f"""
You are an insurance assistant.
Answer only using the provided context.

Context:
{chr(10).join(context_blocks)}

Question:
{question}
"""

response = openai_client.responses.create(
    model="gpt-4o-mini",
    input=prompt
)

print(response.output_text)

That pattern is what you want across agents:

•retrieval agent finds evidence
•reasoning agent drafts answer
•policy/compliance agent checks output before sending it to a user

5) Wire it into a multi-agent workflow

In multi-agent systems, each agent should have a narrow job. One agent retrieves facts from Pinecone; another uses OpenAI to summarize; another validates compliance language.

def retrieve_context(query: str):
    emb = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=[query]
    ).data[0].embedding

    results = index.query(
        vector=emb,
        top_k=5,
        include_metadata=True,
        namespace="claims"
    )

    return [m["metadata"]["chunk"] for m in results["matches"]]

def answer_claim_question(query: str):
    context = retrieve_context(query)

    messages = [
        {
            "role": "system",
            "content": "You are a claims assistant. Use only retrieved context."
        },
        {
            "role": "user",
            "content": f"Context:\n{'\n'.join(context)}\n\nQuestion: {query}"
        }
    ]

    result = openai_client.responses.create(
        model="gpt-4o-mini",
        input=messages
    )
    return result.output_text

This keeps your architecture clean. Retrieval stays deterministic enough to audit; generation stays flexible enough to handle natural language.

Testing the Integration

Run a simple end-to-end test with a known policy question.

test_question = "Is mold damage covered if it comes from a gradual leak?"

answer = answer_claim_question(test_question)
print("ANSWER:", answer)

Expected output:

ANSWER: Based on the retrieved policy context, mold-related damage is excluded when it results from gradual leaks. Coverage applies to accidental water damage such as burst pipes.

If you get an empty or vague response:

•check that embeddings were upserted into the right namespace
•verify your query text matches the domain language in your documents
•inspect top_k results to confirm relevant chunks are being returned

Real-World Use Cases

•
Claims triage assistant
- •Retrieves claim history, policy terms, and adjuster notes from Pinecone.
- •Uses OpenAI to draft next-step recommendations and customer-facing explanations.
•
Underwriting copilot
- •Searches underwriting guidelines by product line and jurisdiction.
- •Lets an agent explain risk exceptions and required documentation.
•
Customer service knowledge agent
- •Answers coverage questions using approved policy content only.
- •Routes edge cases to human handlers with retrieved evidence attached.

The main pattern here is simple: Pinecone holds the memory layer, OpenAI handles reasoning and response generation. For insurance teams building multi-agent systems, that separation is what makes the stack auditable enough for production.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit