How to Integrate OpenAI for banking with Pinecone for AI agents

By Cyprian AaronsUpdated 2026-04-21
openai-for-bankingpineconeai-agents

OpenAI for banking gives your agent the reasoning layer for regulated workflows, while Pinecone gives it durable retrieval over policy docs, product manuals, KYC playbooks, and case notes. Put them together and you get an agent that can answer customer questions with grounded context instead of guessing.

This matters in banking because most useful agent behavior is not pure generation. It’s retrieval plus policy-aware response generation, with traceable context pulled from your own indexed knowledge base.

Prerequisites

  • Python 3.10+
  • An OpenAI API key with access to the models you want to use
  • A Pinecone account and API key
  • A Pinecone index created with the correct vector dimension for your embedding model
  • pip installed
  • Basic familiarity with Python async or sync HTTP clients
  • Internal documents ready to embed:
    • product FAQs
    • compliance policies
    • support scripts
    • fee schedules
    • escalation procedures

Install the SDKs:

pip install openai pinecone

Integration Steps

  1. Set up your clients and environment variables.

Use separate keys and keep them out of source control.

import os
from openai import OpenAI
from pinecone import Pinecone

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
PINECONE_API_KEY = os.environ["PINECONE_API_KEY"]
PINECONE_INDEX_NAME = os.environ["PINECONE_INDEX_NAME"]

client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(PINECONE_INDEX_NAME)
  1. Create embeddings with OpenAI and store them in Pinecone.

For banking agents, chunk documents by policy section or FAQ entry. Store metadata so you can filter by document type, jurisdiction, or product line.

docs = [
    {
        "id": "faq_001",
        "text": "International wire transfers submitted after 3 PM ET are processed on the next business day.",
        "metadata": {"source": "payments_faq", "product": "wires", "jurisdiction": "US"}
    },
    {
        "id": "policy_014",
        "text": "If a customer reports suspected fraud, freeze card access immediately and escalate to the fraud operations queue.",
        "metadata": {"source": "fraud_policy", "product": "cards", "jurisdiction": "US"}
    }
]

embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=[d["text"] for d in docs]
)

vectors = []
for doc, emb in zip(docs, embeddings.data):
    vectors.append({
        "id": doc["id"],
        "values": emb.embedding,
        "metadata": {**doc["metadata"], "text": doc["text"]}
    })

index.upsert(vectors=vectors)
  1. Retrieve relevant context from Pinecone at query time.

Your agent should never answer banking questions from memory alone if policy text exists. Retrieve top-k matches first, then pass them into the model as grounded context.

query = "What happens if a wire transfer is submitted after cutoff?"
query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=query
).data[0].embedding

results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True,
    filter={"product": {"$eq": "wires"}}
)

contexts = []
for match in results["matches"]:
    contexts.append(match["metadata"]["text"])

print(contexts)
  1. Generate the final answer with OpenAI using retrieved context.

This is the core RAG pattern: retrieve first, generate second. For banking workflows, keep the answer constrained to the retrieved content and instruct the model to say when it does not have enough information.

context_block = "\n\n".join([f"- {c}" for c in contexts])

response = client.responses.create(
    model="gpt-4.1-mini",
    input=f"""
You are a banking support assistant.
Answer only using the provided context.
If the context is insufficient, say you do not have enough information.

Context:
{context_block}

Question:
{query}
"""
)

print(response.output_text)
  1. Wrap retrieval and generation into a reusable agent function.

This is what you actually ship: one function that your API layer calls for every user message.

def answer_banking_question(question: str) -> str:
    q_emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=question
    ).data[0].embedding

    matches = index.query(
        vector=q_emb,
        top_k=3,
        include_metadata=True
    )["matches"]

    context_text = "\n".join(
        f"{m['metadata']['text']}" for m in matches if m.get("metadata")
    )

    result = client.responses.create(
        model="gpt-4.1-mini",
        input=f"""
Use only this context to answer:

{context_text}

Question: {question}
"""
    )

    return result.output_text


print(answer_banking_question("Can I send a wire transfer after 3 PM ET?"))

Testing the Integration

Run a simple end-to-end check: embed a known policy sentence, store it, query it back, then generate an answer.

test_question = "When are international wire transfers processed if submitted after 3 PM ET?"
answer = answer_banking_question(test_question)
print(answer)

Expected output:

International wire transfers submitted after 3 PM ET are processed on the next business day.

If your output starts drifting into unsupported claims, fix your prompt and tighten retrieval filters before shipping. In banking systems, hallucinated policy is a production incident.

Real-World Use Cases

  • Customer support agents that answer fee, cutoff time, and account-policy questions using indexed bank documentation.
  • Fraud triage assistants that retrieve escalation playbooks and generate step-by-step operator guidance.
  • Compliance copilots that search internal controls, summarize relevant procedures, and draft responses for review teams.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides