How to Integrate OpenAI for banking with Pinecone for RAG

By Cyprian AaronsUpdated 2026-04-21
openai-for-bankingpineconerag

Combining OpenAI for banking with Pinecone gives you a practical RAG stack for regulated workflows: the model handles reasoning and response generation, while Pinecone stores the bank’s policy docs, product guides, KYC procedures, and support knowledge in a retrievable format. That means your agent can answer from approved internal sources instead of guessing, which is the difference between a useful assistant and a compliance risk.

Prerequisites

  • Python 3.10+
  • An OpenAI for banking API key
  • A Pinecone API key
  • A Pinecone index created with the right dimension for your embedding model
  • Internal documents you want to retrieve from:
    • PDF policy docs
    • FAQ pages
    • onboarding playbooks
    • product manuals
  • Installed packages:
    • openai
    • pinecone
    • python-dotenv
pip install openai pinecone python-dotenv

Integration Steps

  1. Set up your environment variables.

Keep secrets out of source control. Use a .env file or your secret manager.

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")
  1. Initialize the OpenAI client and Pinecone client.

For OpenAI, use the official client and create embeddings with client.embeddings.create(). For Pinecone, initialize the client and connect to your existing index.

from openai import OpenAI
from pinecone import Pinecone

openai_client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)

index = pc.Index(PINECONE_INDEX_NAME)
  1. Chunk documents and generate embeddings.

RAG only works if retrieval is clean. Split documents into small chunks, then embed each chunk before storing it in Pinecone.

def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50):
    chunks = []
    start = 0
    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks

document_id = "banking_policy_001"
text = """
Customers can reset their password after verifying identity through MFA.
For account disputes above $5000, escalate to the fraud review queue.
"""

chunks = chunk_text(text)

embeddings_response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=chunks,
)

vectors = []
for i, item in enumerate(embeddings_response.data):
    vectors.append({
        "id": f"{document_id}-chunk-{i}",
        "values": item.embedding,
        "metadata": {
            "doc_id": document_id,
            "chunk_index": i,
            "text": chunks[i],
            "source": "internal_policy"
        }
    })

index.upsert(vectors=vectors)
  1. Retrieve relevant context from Pinecone for a user query.

When a user asks a question, embed the query with the same embedding model and run a similarity search against Pinecone.

query = "How do we handle disputes over $5000?"

query_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=[query],
).data[0].embedding

results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True
)

contexts = []
for match in results["matches"]:
    contexts.append(match["metadata"]["text"])

context_block = "\n\n".join(contexts)
print(context_block)
  1. Send retrieved context to OpenAI for grounded generation.

This is where the agent becomes useful. Pass the retrieved policy text into the chat completion request and force the model to answer only from that context.

system_prompt = (
    "You are a banking assistant. Answer only using the provided context. "
    "If the context does not contain the answer, say you don't have enough information."
)

user_prompt = f"""
Context:
{context_block}

Question:
{query}
"""

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    temperature=0.1,
)

print(response.choices[0].message.content)

Testing the Integration

Run an end-to-end test with a known policy question. You want to verify three things:

  • The query embedding call succeeds
  • Pinecone returns relevant chunks
  • OpenAI generates an answer grounded in those chunks
test_query = "What happens when a dispute is above $5000?"

test_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=[test_query],
).data[0].embedding

test_results = index.query(
    vector=test_embedding,
    top_k=2,
    include_metadata=True
)

test_contexts = [m["metadata"]["text"] for m in test_results["matches"]]
test_context_block = "\n".join(test_contexts)

test_response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer only from context."},
        {"role": "user", "content": f"Context:\n{test_context_block}\n\nQuestion:\n{test_query}"},
    ],
)

print(test_response.choices[0].message.content)

Expected output:

Disputes above $5000 should be escalated to the fraud review queue.

If you get an answer that mentions unsupported details, tighten your prompt and reduce temperature. If retrieval is off, fix chunking or re-check your index dimension against the embedding model output size.

Real-World Use Cases

  • Policy assistant for operations teams

    • Staff ask questions about card disputes, AML escalation paths, loan servicing rules, or KYC checks.
    • The agent retrieves approved policy text from Pinecone and answers with OpenAI.
  • Customer support copilot

    • Agents get suggested responses based on product docs and internal runbooks.
    • This reduces handle time and keeps answers aligned with current bank policy.
  • Compliance knowledge search

    • Analysts query procedures across multiple document sets.
    • Pinecone handles semantic retrieval; OpenAI turns results into readable summaries or action steps.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides