How to Integrate OpenAI for healthcare with Pinecone for RAG
Combining OpenAI for healthcare with Pinecone gives you a practical RAG stack for clinical and operational workflows. The pattern is simple: store approved medical content, policies, or care pathways in Pinecone, then use OpenAI to answer questions grounded in that retrieved context instead of relying on model memory alone.
This is the setup you want when accuracy, traceability, and controlled retrieval matter. It works well for patient-support assistants, internal clinical knowledge search, and document-heavy workflows where the agent needs to cite the right source before generating a response.
Prerequisites
- •Python 3.10+
- •An OpenAI API key with access to the healthcare-capable model you plan to use
- •A Pinecone account and API key
- •A Pinecone index created with the correct vector dimension for your embedding model
- •
pipinstalled - •Basic familiarity with embeddings and retrieval-augmented generation
- •Environment variables configured:
- •
OPENAI_API_KEY - •
PINECONE_API_KEY
- •
Install the SDKs:
pip install openai pinecone python-dotenv
Integration Steps
1) Initialize OpenAI and Pinecone clients
Start by loading credentials and creating both clients. Keep this in one place so your agent layer can reuse them.
import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone
load_dotenv()
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index_name = "healthcare-rag"
index = pc.Index(index_name)
If you are running this in production, keep secrets out of code and rotate keys through your secret manager.
2) Create embeddings for your healthcare documents
For RAG, your documents need embeddings before they can be stored in Pinecone. Use an embedding model from OpenAI and keep each chunk small enough to retrieve cleanly.
documents = [
{
"id": "doc-001",
"text": "Hypertension management includes lifestyle changes, sodium reduction, exercise, and medication when indicated."
},
{
"id": "doc-002",
"text": "Type 2 diabetes care often includes HbA1c monitoring, diet changes, activity, and medication adherence."
}
]
texts = [d["text"] for d in documents]
embedding_response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
vectors = []
for doc, emb in zip(documents, embedding_response.data):
vectors.append({
"id": doc["id"],
"values": emb.embedding,
"metadata": {"text": doc["text"]}
})
Use metadata aggressively. In healthcare systems, you usually want source type, version, specialty, approval status, and last-reviewed date attached to every chunk.
3) Upsert vectors into Pinecone
Now push those vectors into your index. This makes them retrievable by semantic similarity later.
upsert_response = index.upsert(vectors=vectors)
print(upsert_response)
A practical pattern is to namespace by tenant or content domain.
index.upsert(
vectors=vectors,
namespace="clinical-guidelines"
)
That keeps retrieval isolated between departments, customers, or care programs.
4) Retrieve relevant context for a user query
When a user asks a question, embed the query and search Pinecone for the most relevant chunks.
query = "How should we manage high blood pressure in a patient?"
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=[query]
).data[0].embedding
search_results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True,
namespace="clinical-guidelines"
)
contexts = [
match["metadata"]["text"]
for match in search_results["matches"]
]
At this point you have the retrieval side of RAG. The key is not returning raw search results directly to users; pass them into the model as grounded context.
5) Generate a grounded answer with OpenAI
Use the retrieved passages as context in a chat completion call. Keep the prompt strict so the model answers only from supplied sources.
context_block = "\n\n".join([f"- {c}" for c in contexts])
messages = [
{
"role": "system",
"content": (
"You are a healthcare assistant. Answer only using the provided context. "
"If the context is insufficient, say so clearly."
)
},
{
"role": "user",
"content": f"Context:\n{context_block}\n\nQuestion: {query}"
}
]
response = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.2
)
print(response.choices[0].message.content)
For regulated workflows, add guardrails:
- •refuse diagnosis claims unless your workflow explicitly supports them
- •include citations from metadata
- •log retrieved document IDs for auditability
Testing the Integration
Run an end-to-end test that inserts one known record, queries it back, and checks whether the answer uses that record.
test_query = "What are common first-line lifestyle changes for hypertension?"
q_emb = openai_client.embeddings.create(
model="text-embedding-3-small",
input=[test_query]
).data[0].embedding
result = index.query(
vector=q_emb,
top_k=1,
include_metadata=True,
namespace="clinical-guidelines"
)
retrieved_text = result["matches"][0]["metadata"]["text"]
final_answer = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Answer using only provided context."},
{"role": "user", "content": f"Context: {retrieved_text}\n\nQuestion: {test_query}"}
],
).choices[0].message.content
print("Retrieved:", retrieved_text)
print("Answer:", final_answer)
Expected output:
Retrieved: Hypertension management includes lifestyle changes, sodium reduction, exercise, and medication when indicated.
Answer: Common first-line lifestyle changes include sodium reduction and regular exercise.
If you get an unrelated answer:
- •check embedding dimensions match your Pinecone index configuration
- •verify you upserted into the same namespace you query from
- •confirm your prompt restricts generation to retrieved context only
Real-World Use Cases
- •
Clinical knowledge assistant for staff
Retrieve approved treatment guidance, triage protocols, or medication references before generating responses. - •
Patient support agent
Answer questions about appointment prep, post-discharge instructions, or insurance-facing care navigation using approved content only. - •
Internal policy search
Let operations teams query benefits rules, prior authorization steps, or compliance docs with grounded answers and source tracking.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit