How to Integrate OpenAI for banking with Pinecone for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21

openai-for-bankingpineconemulti-agent-systems

OpenAI for banking gives you the reasoning layer for regulated workflows: classification, summarization, extraction, and policy-aware response generation. Pinecone gives you durable semantic memory across agents, so each agent can retrieve the right customer context, product docs, or case history without stuffing everything into the prompt.

Together, they let you build multi-agent systems that can handle banking tasks like fraud triage, KYC support, loan pre-screening, and advisor copilots with shared retrieval over a governed knowledge base.

Prerequisites

•Python 3.10+
•An OpenAI API key with access to the model you plan to use
•A Pinecone API key and an existing index
•pip installed
•
A vector embedding strategy
- •Either OpenAI embeddings
- •Or a separate embedding model if your architecture requires it
•
Basic familiarity with:
- •openai Python SDK
- •pinecone Python SDK
- •async or synchronous agent orchestration

Install the packages:

pip install openai pinecone

Set environment variables:

export OPENAI_API_KEY="your-openai-key"
export PINECONE_API_KEY="your-pinecone-key"
export PINECONE_INDEX_NAME="banking-agent-memory"

Integration Steps

•Initialize both clients and verify connectivity.

import os
from openai import OpenAI
from pinecone import Pinecone

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index_name = os.environ["PINECONE_INDEX_NAME"]
index = pc.Index(index_name)

print("OpenAI client ready")
print("Pinecone index ready:", index_name)

•Create embeddings with OpenAI and store them in Pinecone.

Use one embedding model consistently for all agent memory writes and reads.

from openai import OpenAI
from pinecone import Pinecone

client = OpenAI()
pc = Pinecone()

index = pc.Index("banking-agent-memory")

text = "Customer asked about chargeback status for card ending 4821. Case opened on 2026-04-18."
embedding_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=text
)

vector = embedding_response.data[0].embedding

index.upsert(
    vectors=[
        {
            "id": "case_4821_001",
            "values": vector,
            "metadata": {
                "customer_id": "cust_10492",
                "case_type": "chargeback",
                "source": "support_ticket",
                "text": text
            }
        }
    ]
)

print("Upsert complete")

•Build a retrieval function for your agents.

Each agent should query Pinecone before calling OpenAI so it has relevant context from prior cases or policy docs.

from openai import OpenAI
from pinecone import Pinecone

client = OpenAI()
pc = Pinecone()
index = pc.Index("banking-agent-memory")

def embed_query(query: str):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    return response.data[0].embedding

def retrieve_context(query: str, top_k: int = 3):
    query_vector = embed_query(query)
    results = index.query(
        vector=query_vector,
        top_k=top_k,
        include_metadata=True
    )
    contexts = []
    for match in results.matches:
        contexts.append(match.metadata.get("text", ""))
    return contexts

query = "What is the latest update on the customer's chargeback?"
contexts = retrieve_context(query)
print(contexts)

•Pass retrieved memory into an OpenAI banking agent call.

This is where the bank-grade workflow starts to matter. The model should answer using retrieved evidence instead of guessing.

from openai import OpenAI

client = OpenAI()

def answer_with_memory(user પ્રશ્ન?):
    contexts = retrieve_context(user_question)

    system_prompt = (
        "You are a banking support agent. "
        "Use only the provided context when answering. "
        "If the context is insufficient, say what is missing."
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": f"Question: {user_question}\n\nContext:\n" + "\n".join(contexts)
        }
    ]

    response = client.responses.create(
        model="gpt-4.1-mini",
        input=messages
    )
    return response.output_text

print(answer_with_memory("What is the latest update on the customer's chargeback?"))

•Share memory across multiple agents.

In multi-agent systems, one agent can write summaries to Pinecone while another retrieves them later for compliance checks or customer-facing responses.

from openai import OpenAI
from pinecone import Pinecone

client = OpenAI()
pc = Pinecone()
index = pc.Index("banking-agent-memory")

def write_agent_summary(agent_id: str, conversation_id: str, summary: str):
    emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=summary
    ).data[0].embedding

    index.upsert(
        vectors=[{
            "id": f"{conversation_id}:{agent_id}",
            "values": emb,
            "metadata": {
                "agent_id": agent_id,
                "conversation_id": conversation_id,
                "text": summary,
                "type": "agent_summary"
            }
        }]
    )

def get_shared_memory(query: str):
    qvec = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    ).data[0].embedding

    return index.query(vector=qvec, top_k=5, include_metadata=True)

write_agent_summary(
    agent_id="fraud_agent",
    conversation_id="conv_7781",
    summary="Transaction flagged as medium risk. Customer confirmed travel notice submitted."
)

shared = get_shared_memory("Was there a travel notice on this account?")
print(shared.matches[0].metadata["text"])

Testing the Integration

Run a full round-trip test: write a record to Pinecone, retrieve it, then ask OpenAI to answer using that record.

test_text = (
    "Customer requested a replacement debit card after suspected fraud on 2026-04-20."
)

# Store test memory
vec = client.embeddings.create(
    model="text-embedding-3-small",
    input=test_text
).data[0].embedding

index.upsert(vectors=[{
    "id": "test_case_001",
    "values": vec,
    "metadata": {"text": test_text}
}])

# Retrieve and answer
query_vec = client.embeddings.create(
    model="text-embedding-3-small",
    input="Why did the customer request a new card?"
).data[0].embedding

matches = index.query(vector=query_vec, top_k=1, include_metadata=True)
context_text = matches.matches[0].metadata["text"]

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[
        {"role": "system", "content": "Answer using only the provided context."},
        {"role": "user", "content": f"Context: {context_text}\n\nQuestion: Why did the customer request a new card?"}
    ]
)

print(response.output_text)

Expected output:

The customer requested a replacement debit card after suspected fraud on 2026-04-20.

Real-World Use Cases

•
Fraud investigation copilot:
- •One agent retrieves similar historical fraud cases from Pinecone.
- •Another agent uses OpenAI to draft analyst notes and next-step recommendations.
•
KYC and onboarding assistant:
- •Store policy docs, checklist items, and prior onboarding conversations in Pinecone.
- •Use OpenAI to summarize missing documents and generate compliant follow-up messages.
•
Relationship manager workspace:
- •Persist customer interactions, product interests, and meeting summaries.
- •Let multiple agents retrieve shared memory for personalized outreach and portfolio review.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit