How to Integrate OpenAI for investment banking with Pinecone for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21
openai-for-investment-bankingpineconemulti-agent-systems

OpenAI gives you the reasoning layer for research, drafting, and decision support in investment banking. Pinecone gives you persistent vector memory so multiple agents can retrieve prior deals, filings, research notes, and internal playbooks without stuffing everything into context.

Combined, they let you build agent systems that can answer banker-style questions with grounded retrieval: one agent finds relevant market comps, another summarizes filings, another drafts an IC memo, and all of them share the same indexed knowledge base.

Prerequisites

  • Python 3.10+
  • An OpenAI API key
  • A Pinecone API key
  • A Pinecone index created with the right dimension for your embedding model
  • pip installed
  • Access to your source data:
    • SEC filings
    • deal tombstones
    • internal research PDFs
    • analyst notes
  • Python packages:
    • openai
    • pinecone
    • python-dotenv

Install dependencies:

pip install openai pinecone python-dotenv

Integration Steps

  1. Set up your environment variables

Keep secrets out of code. Use a .env file for local development and inject the same variables in your deployment pipeline.

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")

if not OPENAI_API_KEY or not PINECONE_API_KEY or not PINECONE_INDEX_NAME:
    raise ValueError("Missing required environment variables")
  1. Create OpenAI embeddings for banking documents

Use OpenAI embeddings to convert deal docs, earnings call transcripts, and memos into vectors. For production, chunk documents before embedding so retrieval stays precise.

from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)

text_chunks = [
    "Company A reported revenue growth of 18% YoY in FY2024.",
    "The proposed acquisition values the target at 12.4x EBITDA.",
]

embedding_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=text_chunks,
)

vectors = [item.embedding for item in embedding_response.data]
print(len(vectors), len(vectors[0]))
  1. Connect to Pinecone and upsert vectors

Create or connect to an index, then store each chunk with metadata that matters for bankers: ticker, deal type, date, source, and document ID.

from pinecone import Pinecone

pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(PINECONE_INDEX_NAME)

records = []
for i, vector in enumerate(vectors):
    records.append({
        "id": f"doc-{i}",
        "values": vector,
        "metadata": {
            "source": "investment_banking_notes",
            "chunk_text": text_chunks[i],
            "asset_class": "m&a",
            "region": "na"
        }
    })

index.upsert(vectors=records)
print("Upsert complete")
  1. Query Pinecone from an OpenAI-powered agent

At runtime, one agent can turn a user question into an embedding query, retrieve the most relevant chunks from Pinecone, then pass that context into OpenAI for synthesis.

query = "What valuation multiple was used in the acquisition discussion?"

query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=query,
).data[0].embedding

search_results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True,
)

context_blocks = []
for match in search_results["matches"]:
    context_blocks.append(match["metadata"]["chunk_text"])

context = "\n\n".join(context_blocks)

response = client.responses.create(
    model="gpt-4.1-mini",
    input=f"""
You are an investment banking analyst.
Answer only using the provided context.

Question: {query}

Context:
{context}
"""
)

print(response.output_text)
  1. Wire this into a multi-agent workflow

In a multi-agent system, you usually split responsibilities:

  • retrieval agent: finds relevant documents in Pinecone
  • analysis agent: interprets the retrieved context using OpenAI
  • drafting agent: writes the final memo or summary

A simple orchestration pattern looks like this:

def retrieve_context(question: str) -> str:
    q_emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=question,
    ).data[0].embedding

    results = index.query(vector=q_emb, top_k=5, include_metadata=True)
    return "\n\n".join(
        match["metadata"]["chunk_text"] for match in results["matches"]
    )

def analyze_with_openai(question: str) -> str:
    context = retrieve_context(question)

    result = client.responses.create(
        model="gpt-4.1-mini",
        input=f"""
You are supporting an investment banking team.
Use only the context below.

Question: {question}

Context:
{context}
"""
    )
    return result.output_text

answer = analyze_with_openai("Summarize the key valuation points from the target discussion.")
print(answer)

Testing the Integration

Run a smoke test against a known document chunk and confirm retrieval plus generation both work.

test_question = "What was the EBITDA multiple mentioned?"
test_answer = analyze_with_openai(test_question)

print("ANSWER:")
print(test_answer)

Expected output:

ANSWER:
The acquisition discussion referenced a valuation of 12.4x EBITDA.

If you get empty or irrelevant answers:

  • check that your Pinecone index dimension matches the embedding model output
  • verify metadata was stored during upsert
  • confirm your query text is close to the language used in your source docs
  • inspect top_k results before sending them to OpenAI

Real-World Use Cases

  • Deal room assistant

    • Search across CIMs, diligence notes, and board materials.
    • Answer banker questions with grounded citations from indexed content.
  • Multi-agent pitchbook builder

    • One agent retrieves market data from Pinecone.
    • Another drafts slides with OpenAI based on retrieved comps and commentary.
  • IC memo copilot

    • Retrieve prior approvals, risk flags, and transaction rationale.
    • Generate a first draft of investment committee notes with consistent internal language.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides