How to Integrate OpenAI for fintech with Pinecone for RAG
If you’re building an AI agent for fintech, you need two things: a model that can reason over financial questions, and a retrieval layer that can ground answers in your own documents. OpenAI gives you the generation and tool-use layer; Pinecone gives you fast semantic retrieval over policies, product docs, filings, and support knowledge.
That combo is what turns a generic chatbot into a useful assistant for KYC workflows, investment ops, customer support, and internal compliance search.
Prerequisites
- •Python 3.10+
- •An OpenAI API key
- •A Pinecone API key
- •A Pinecone index created with the correct embedding dimension
- •
pipinstalled - •Access to your fintech knowledge base:
- •policy PDFs
- •product documentation
- •FAQ articles
- •compliance notes
- •These packages:
- •
openai - •
pinecone - •
python-dotenv
- •
Install them:
pip install openai pinecone python-dotenv
Integration Steps
- •Set up environment variables
Keep secrets out of code. For fintech systems, this is non-negotiable.
import os
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")
if not OPENAI_API_KEY or not PINECONE_API_KEY or not PINECONE_INDEX_NAME:
raise ValueError("Missing required environment variables")
- •Create clients for OpenAI and Pinecone
Use OpenAI for embeddings and answer generation. Use Pinecone for vector storage and retrieval.
from openai import OpenAI
from pinecone import Pinecone
client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(PINECONE_INDEX_NAME)
- •Embed your documents and store them in Pinecone
For RAG, each chunk of text becomes an embedding vector plus metadata. In fintech, keep metadata tight: source, document type, version, and access scope.
documents = [
{
"id": "doc-001",
"text": "AML review requires escalation when transaction patterns exceed internal thresholds.",
"metadata": {"source": "aml_policy_v3", "type": "policy"}
},
{
"id": "doc-002",
"text": "KYC onboarding requires government ID verification and proof of address.",
"metadata": {"source": "kyc_playbook", "type": "process"}
}
]
texts = [d["text"] for d in documents]
embeddings_response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
vectors = []
for doc, item in zip(documents, embeddings_response.data):
vectors.append({
"id": doc["id"],
"values": item.embedding,
"metadata": {
**doc["metadata"],
"text": doc["text"]
}
})
index.upsert(vectors=vectors)
- •Retrieve relevant context from Pinecone
When a user asks a question, embed the query with the same model, then query Pinecone for the nearest matches.
query = "When should AML cases be escalated?"
query_embedding = client.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
contexts = []
for match in results.matches:
contexts.append(match.metadata["text"])
print(contexts)
- •Generate the final answer with OpenAI using retrieved context
This is the actual RAG step. Pass the retrieved snippets into the model and force it to answer from context.
context_block = "\n\n".join(contexts)
response = client.responses.create(
model="gpt-4.1-mini",
input=[
{
"role": "system",
"content": (
"You are a fintech assistant. Answer only using the provided context. "
"If the context is insufficient, say you do not have enough information."
)
},
{
"role": "user",
"content": f"Context:\n{context_block}\n\nQuestion: {query}"
}
]
)
print(response.output_text)
Testing the Integration
Run a full end-to-end check: embed a query, retrieve from Pinecone, then generate an answer with OpenAI.
def rag_answer(question: str) -> str:
q_emb = client.embeddings.create(
model="text-embedding-3-small",
input=question
).data[0].embedding
matches = index.query(vector=q_emb, top_k=2, include_metadata=True).matches
context = "\n\n".join([m.metadata["text"] for m in matches])
resp = client.responses.create(
model="gpt-4.1-mini",
input=[
{"role": "system", "content": "Answer using only the provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
]
)
return resp.output_text
print(rag_answer("What triggers AML escalation?"))
Expected output:
AML cases should be escalated when transaction patterns exceed internal thresholds or show suspicious behavior that requires further review.
If you get an answer grounded in your policy text instead of a generic model response, the integration is working.
Real-World Use Cases
- •Compliance assistant
- •Answer questions about AML, KYC, sanctions screening, and internal controls using your policy corpus.
- •Customer support copilot
- •Retrieve product-specific account rules, fee schedules, and onboarding steps before generating responses.
- •Ops knowledge agent
- •Help analysts search internal runbooks, incident procedures, and escalation paths without manually digging through docs.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit