How to Integrate OpenAI for pension funds with Pinecone for startups

By Cyprian AaronsUpdated 2026-04-21
openai-for-pension-fundspineconestartups

Combining OpenAI for pension funds with Pinecone gives you a practical pattern for building regulated AI agents that can answer policy questions, search internal documents, and retrieve case history with low latency. The useful part is not the model alone; it’s pairing generation with vector search so your agent can ground answers in pension documents, member records, and knowledge base content.

For startups building AI systems around financial services, this is the difference between a chat demo and something you can actually put behind an internal workflow.

Prerequisites

  • Python 3.10+
  • An OpenAI API key with access to the model you want to use
  • A Pinecone account and API key
  • A Pinecone index created ahead of time
  • pip installed
  • Basic familiarity with embeddings and vector search
  • Environment variables set for secrets:
    • OPENAI_API_KEY
    • PINECONE_API_KEY

Install the SDKs:

pip install openai pinecone python-dotenv

Integration Steps

  1. Set up your client objects and load secrets.

Use environment variables and initialize both SDKs once at startup. For production systems, keep this in a dedicated config module.

import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone

load_dotenv()

openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

index_name = "pension-fund-docs"
index = pc.Index(index_name)
  1. Convert pension fund documents into embeddings.

You need embeddings before you can store anything in Pinecone. For example, chunk policy PDFs, FAQ pages, or benefit rules, then embed each chunk using OpenAI.

documents = [
    {
        "id": "doc-001",
        "text": "Members can request retirement benefit statements once per quarter.",
        "metadata": {"source": "member-policy", "type": "faq"}
    },
    {
        "id": "doc-002",
        "text": "Early retirement requires approval if the member is under 55.",
        "metadata": {"source": "benefit-rules", "type": "policy"}
    }
]

texts = [doc["text"] for doc in documents]

embeddings_response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

vectors = []
for doc, item in zip(documents, embeddings_response.data):
    vectors.append({
        "id": doc["id"],
        "values": item.embedding,
        "metadata": {
            **doc["metadata"],
            "text": doc["text"]
        }
    })
  1. Upsert vectors into Pinecone.

Once you have embeddings, store them in your index. This is what gives your agent fast retrieval over pension content.

upsert_response = index.upsert(vectors=vectors)
print(upsert_response)
  1. Query Pinecone with a user question and pass context to OpenAI.

At runtime, embed the user query, retrieve top matches from Pinecone, then feed those matches into the chat model as grounded context.

user_question = "Can a member under 55 take early retirement benefits?"

query_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=user_question
).data[0].embedding

search_results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True
)

context_chunks = []
for match in search_results["matches"]:
    context_chunks.append(match["metadata"]["text"])

context_text = "\n\n".join(context_chunks)

chat_response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a pension fund assistant. "
                "Answer only using the provided context. "
                "If the context is insufficient, say so."
            )
        },
        {
            "role": "user",
            "content": f"Context:\n{context_text}\n\nQuestion: {user_question}"
        }
    ]
)

print(chat_response.choices[0].message.content)
  1. Wrap retrieval and generation into one reusable function.

This is the pattern you’ll use inside an agent service or API endpoint.

def answer_pension_question(question: str) -> str:
    q_embedding = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=question
    ).data[0].embedding

    results = index.query(
        vector=q_embedding,
        top_k=3,
        include_metadata=True
    )

    context = "\n\n".join(
        match["metadata"]["text"] for match in results["matches"]
    )

    completion = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You answer questions for a pension operations team. "
                    "Use only retrieved context."
                )
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )

    return completion.choices[0].message.content


print(answer_pension_question("What is the rule for early retirement?"))

Testing the Integration

Run a smoke test that inserts one known document and queries it back.

test_doc = {
    "id": f"test-{os.getpid()}",
    "text": "Benefit statements are available every quarter.",
    "metadata": {"source": "test"}
}

test_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=test_doc["text"]
).data[0].embedding

index.upsert(vectors=[{
    "id": test_doc["id"],
    "values": test_embedding,
    "metadata": {"text": test_doc["text"], **test_doc["metadata"]}
}])

answer = answer_pension_question("When are benefit statements available?")
print(answer)

Expected output:

Benefit statements are available every quarter.

If your output mentions unrelated policy text or says it cannot find context when you just inserted a matching document, check:

  • Your index name and namespace
  • Embedding model consistency between insert and query
  • Whether metadata includes the source text
  • Whether Pinecone returned matches with non-empty metadata

Real-World Use Cases

  • Pension member support agent that answers policy questions from indexed plan documents while keeping responses grounded in approved text.
  • Internal ops assistant that helps staff search contribution rules, eligibility criteria, and benefit calculation notes across thousands of document chunks.
  • Compliance review workflow where an agent retrieves prior cases from Pinecone and drafts responses using OpenAI based only on approved pension content.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides