How to Integrate OpenAI for wealth management with Pinecone for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21
openai-for-wealth-managementpineconemulti-agent-systems

Why this integration matters

Wealth management agents are only useful if they can answer with firm-specific context, not generic market chatter. Pairing OpenAI with Pinecone gives you a pattern for multi-agent systems where one agent reasons over client intent, another retrieves portfolio policy or research snippets, and both stay grounded in the same memory layer.

The practical win is simple: you can build agents that handle advisor Q&A, investment policy lookup, and client-specific recommendations without stuffing everything into the prompt. Pinecone stores the retrieval layer, OpenAI handles reasoning and response generation.

Prerequisites

  • Python 3.10+
  • An OpenAI API key
  • A Pinecone API key
  • A Pinecone index created in advance
  • pip installed
  • Basic familiarity with:
    • embeddings
    • vector search
    • function calling or tool-based agent orchestration
  • Python packages:
    • openai
    • pinecone
    • python-dotenv

Install them:

pip install openai pinecone python-dotenv

Set environment variables:

export OPENAI_API_KEY="your-openai-key"
export PINECONE_API_KEY="your-pinecone-key"
export PINECONE_INDEX_NAME="wealth-memory"

Integration Steps

1) Initialize OpenAI and Pinecone clients

Start by loading both clients in the same service. Keep this in one module so every agent uses the same connection setup.

import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone

load_dotenv()

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index_name = os.environ["PINECONE_INDEX_NAME"]
index = pc.Index(index_name)

This is the foundation for your multi-agent system. One agent can create embeddings, another can query Pinecone, and a coordinator agent can call OpenAI to generate the final answer.

2) Create embeddings with OpenAI for wealth-management content

For wealth management use cases, embed advisor notes, policy docs, product sheets, and research summaries. Use text-embedding-3-small for cost-sensitive workloads or text-embedding-3-large if retrieval quality matters more.

def embed_text(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding


documents = [
    {
        "id": "policy_001",
        "text": "Client risk profiles must be reviewed quarterly. High-net-worth clients require suitability checks before alternatives are recommended.",
        "metadata": {"type": "policy", "topic": "suitability"}
    },
    {
        "id": "note_001",
        "text": "Client prefers capital preservation over aggressive growth. Consider municipal bonds and dividend-focused strategies.",
        "metadata": {"type": "advisor_note", "client_id": "C102"}
    }
]

In production, chunk long documents before embedding. Don’t send entire policy manuals as a single record.

3) Upsert vectors into Pinecone

Now store those embeddings in your index. Keep metadata tight and query-friendly so agents can filter by document type, client ID, or topic.

vectors = []
for doc in documents:
    vectors.append({
        "id": doc["id"],
        "values": embed_text(doc["text"]),
        "metadata": {
            **doc["metadata"],
            "text": doc["text"]
        }
    })

index.upsert(vectors=vectors)
print("Upsert complete")

This gives each agent a shared memory store. In a multi-agent setup, one agent might write new meeting notes while another retrieves them during a client conversation.

4) Query Pinecone from an agent and pass context to OpenAI

This is the core pattern: retrieve relevant context from Pinecone, then send it to OpenAI as grounded input for response generation.

def retrieve_context(query: str, top_k: int = 3) -> str:
    query_embedding = embed_text(query)

    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )

    chunks = []
    for match in results["matches"]:
        meta = match["metadata"]
        chunks.append(meta.get("text", ""))

    return "\n\n".join(chunks)


def generate_answer(user_query: str) -> str:
    context = retrieve_context(user_query)

    prompt = f"""
You are a wealth management assistant.
Use only the provided context when answering.
If the context is insufficient, say so clearly.

Context:
{context}

User question:
{user_query}
"""

    response = openai_client.responses.create(
        model="gpt-4o-mini",
        input=prompt,
    )
    return response.output_text


answer = generate_answer("What should we recommend for a client focused on capital preservation?")
print(answer)

For multi-agent systems, this function becomes one tool in a larger orchestration layer. A planner agent can decide whether to call retrieval first or ask clarifying questions.

5) Wire it into a simple multi-agent flow

A clean pattern is: planner agent → retriever agent → writer agent. The retriever uses Pinecone; the writer uses OpenAI to produce the final output.

def planner(user_message: str) -> dict:
    return {
        "needs_retrieval": True,
        "query": user_message,
    }


def retriever_agent(query: str) -> str:
    return retrieve_context(query)


def writer_agent(user_message: str, context: str) -> str:
    response = openai_client.responses.create(
        model="gpt-4o-mini",
        input=f"""
You are an assistant for wealth management operations.
Answer using this context only:

{context}

Question:
{user_message}
"""
    )
    return response.output_text


plan = planner("Summarize suitable options for a conservative client.")
if plan["needs_retrieval"]:
    ctx = retriever_agent(plan["query"])
    final_answer = writer_agent(plan["query"], ctx)
else:
    final_answer = writer_agent(plan["query"], "")

print(final_answer)

That structure scales better than stuffing everything into one monolithic prompt. Each agent has one job and one failure mode.

Testing the Integration

Run a direct retrieval-and-generation test with known seeded data. If Pinecone returns the right text and OpenAI reflects it in the answer, your pipeline is wired correctly.

test_query = "What investment approach fits a client who prioritizes capital preservation?"
context = retrieve_context(test_query)

print("RETRIEVED CONTEXT:")
print(context)

answer = generate_answer(test_query)
print("\nMODEL ANSWER:")
print(answer)

Expected output:

RETRIEVED CONTEXT:
Client prefers capital preservation over aggressive growth. Consider municipal bonds and dividend-focused strategies.

MODEL ANSWER:
Based on the retrieved context, a capital-preservation-focused client may be better suited to municipal bonds and dividend-focused strategies...

If retrieval looks wrong:

  • check embedding model consistency between upsert and query
  • confirm the index dimension matches your embedding model
  • verify metadata filters if you use them

Real-World Use Cases

  • Advisor copilot
    • Retrieves policy docs, suitability rules, and prior meeting notes before drafting responses to client questions.
  • Client briefing generator
    • Summarizes portfolio changes, market commentary, and account-level notes into a concise pre-meeting brief.
  • Multi-agent compliance workflow
    • One agent drafts recommendations, another checks them against stored policy snippets in Pinecone before anything reaches an advisor or client.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides