How to Integrate OpenAI for fintech with Pinecone for RAG

By Cyprian AaronsUpdated 2026-04-21
openai-for-fintechpineconerag

If you’re building an AI agent for fintech, you need two things: a model that can reason over financial questions, and a retrieval layer that can ground answers in your own documents. OpenAI gives you the generation and tool-use layer; Pinecone gives you fast semantic retrieval over policies, product docs, filings, and support knowledge.

That combo is what turns a generic chatbot into a useful assistant for KYC workflows, investment ops, customer support, and internal compliance search.

Prerequisites

  • Python 3.10+
  • An OpenAI API key
  • A Pinecone API key
  • A Pinecone index created with the correct embedding dimension
  • pip installed
  • Access to your fintech knowledge base:
    • policy PDFs
    • product documentation
    • FAQ articles
    • compliance notes
  • These packages:
    • openai
    • pinecone
    • python-dotenv

Install them:

pip install openai pinecone python-dotenv

Integration Steps

  1. Set up environment variables

Keep secrets out of code. For fintech systems, this is non-negotiable.

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")

if not OPENAI_API_KEY or not PINECONE_API_KEY or not PINECONE_INDEX_NAME:
    raise ValueError("Missing required environment variables")
  1. Create clients for OpenAI and Pinecone

Use OpenAI for embeddings and answer generation. Use Pinecone for vector storage and retrieval.

from openai import OpenAI
from pinecone import Pinecone

client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)

index = pc.Index(PINECONE_INDEX_NAME)
  1. Embed your documents and store them in Pinecone

For RAG, each chunk of text becomes an embedding vector plus metadata. In fintech, keep metadata tight: source, document type, version, and access scope.

documents = [
    {
        "id": "doc-001",
        "text": "AML review requires escalation when transaction patterns exceed internal thresholds.",
        "metadata": {"source": "aml_policy_v3", "type": "policy"}
    },
    {
        "id": "doc-002",
        "text": "KYC onboarding requires government ID verification and proof of address.",
        "metadata": {"source": "kyc_playbook", "type": "process"}
    }
]

texts = [d["text"] for d in documents]

embeddings_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

vectors = []
for doc, item in zip(documents, embeddings_response.data):
    vectors.append({
        "id": doc["id"],
        "values": item.embedding,
        "metadata": {
            **doc["metadata"],
            "text": doc["text"]
        }
    })

index.upsert(vectors=vectors)
  1. Retrieve relevant context from Pinecone

When a user asks a question, embed the query with the same model, then query Pinecone for the nearest matches.

query = "When should AML cases be escalated?"

query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=query
).data[0].embedding

results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True
)

contexts = []
for match in results.matches:
    contexts.append(match.metadata["text"])

print(contexts)
  1. Generate the final answer with OpenAI using retrieved context

This is the actual RAG step. Pass the retrieved snippets into the model and force it to answer from context.

context_block = "\n\n".join(contexts)

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[
        {
            "role": "system",
            "content": (
                "You are a fintech assistant. Answer only using the provided context. "
                "If the context is insufficient, say you do not have enough information."
            )
        },
        {
            "role": "user",
            "content": f"Context:\n{context_block}\n\nQuestion: {query}"
        }
    ]
)

print(response.output_text)

Testing the Integration

Run a full end-to-end check: embed a query, retrieve from Pinecone, then generate an answer with OpenAI.

def rag_answer(question: str) -> str:
    q_emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=question
    ).data[0].embedding

    matches = index.query(vector=q_emb, top_k=2, include_metadata=True).matches
    context = "\n\n".join([m.metadata["text"] for m in matches])

    resp = client.responses.create(
        model="gpt-4.1-mini",
        input=[
            {"role": "system", "content": "Answer using only the provided context."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
        ]
    )
    return resp.output_text

print(rag_answer("What triggers AML escalation?"))

Expected output:

AML cases should be escalated when transaction patterns exceed internal thresholds or show suspicious behavior that requires further review.

If you get an answer grounded in your policy text instead of a generic model response, the integration is working.

Real-World Use Cases

  • Compliance assistant
    • Answer questions about AML, KYC, sanctions screening, and internal controls using your policy corpus.
  • Customer support copilot
    • Retrieve product-specific account rules, fee schedules, and onboarding steps before generating responses.
  • Ops knowledge agent
    • Help analysts search internal runbooks, incident procedures, and escalation paths without manually digging through docs.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides