How to Integrate OpenAI for healthcare with Pinecone for AI agents
Combining OpenAI for healthcare with Pinecone gives you a practical pattern for building agent systems that can answer clinical support questions from your own approved knowledge base. The OpenAI side handles reasoning, extraction, and response generation, while Pinecone gives the agent fast semantic retrieval over policies, care pathways, formulary docs, and internal medical content.
This is the difference between a generic chatbot and a production-grade healthcare assistant: the model answers from grounded context instead of guessing.
Prerequisites
- •Python 3.10+
- •An OpenAI API key with access to the healthcare-capable model you plan to use
- •A Pinecone account and API key
- •A Pinecone index created with the right vector dimension for your embedding model
- •
pipinstalled - •Basic familiarity with Python async or sync APIs
- •A local
.envfile or secret manager for:- •
OPENAI_API_KEY - •
PINECONE_API_KEY
- •
Install the SDKs:
pip install openai pinecone python-dotenv
Integration Steps
1) Initialize both clients
Use environment variables and keep credentials out of code. For healthcare workloads, treat prompts and retrieved documents as sensitive data and keep access scoped.
import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone
load_dotenv()
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("healthcare-agent-index")
2) Create embeddings for your healthcare content
You need embeddings for policies, care guidelines, discharge instructions, or benefits docs. OpenAI’s embedding API turns text into vectors that Pinecone can store and search.
documents = [
{
"id": "doc-001",
"text": "For chest pain triage, escalate immediately if symptoms include shortness of breath, diaphoresis, or radiating pain."
},
{
"id": "doc-002",
"text": "Prior authorization is required for MRI imaging unless the patient has documented emergency indications."
}
]
embed_response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=[doc["text"] for doc in documents]
)
vectors = []
for doc, item in zip(documents, embed_response.data):
vectors.append({
"id": doc["id"],
"values": item.embedding,
"metadata": {"text": doc["text"]}
})
If your index does not exist yet, create it with the correct dimension for the embedding model you chose. For text-embedding-3-small, use the dimension returned by the model setup you standardize on.
3) Upsert vectors into Pinecone
Store the embedded content in Pinecone so your agent can retrieve it later by semantic similarity.
index.upsert(vectors=vectors)
print("Upsert complete")
At this point, your knowledge base is searchable. In a real system, you would chunk long clinical documents before embedding them so each chunk stays focused.
4) Retrieve context for a user question
When the agent receives a question, embed that question and query Pinecone for the most relevant passages.
user_question = "When should a patient with chest pain be escalated?"
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=user_question
).data[0].embedding
search_results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
context_chunks = []
for match in search_results["matches"]:
context_chunks.append(match["metadata"]["text"])
context = "\n".join(context_chunks)
print(context)
This is the core retrieval step. Your agent should only answer using retrieved context plus approved system instructions.
5) Generate a grounded answer with OpenAI
Now pass the retrieved context into the chat/completions API and force the model to answer from that context. If you are using a healthcare-specific deployment or policy-controlled endpoint in your environment, keep the same pattern: retrieve first, then generate.
messages = [
{
"role": "system",
"content": (
"You are a healthcare support assistant. "
"Answer only from the provided context. "
"If the context is insufficient, say you do not have enough information."
)
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {user_question}"
}
]
response = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.2
)
answer = response.choices[0].message.content
print(answer)
That gives you a clean RAG flow: embed → store → retrieve → answer.
Testing the Integration
Run an end-to-end check with one known document and one question that should hit it.
test_question = "What symptoms require immediate escalation for chest pain?"
q_emb = openai_client.embeddings.create(
model="text-embedding-3-small",
input=test_question
).data[0].embedding
result = index.query(vector=q_emb, top_k=1, include_metadata=True)
retrieved_text = result["matches"][0]["metadata"]["text"]
final_resp = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Answer only from context."},
{"role": "user", "content": f"Context:\n{retrieved_text}\n\nQuestion: {test_question}"}
]
)
print("Retrieved:", retrieved_text)
print("Answer:", final_resp.choices[0].message.content)
Expected output:
Retrieved: For chest pain triage, escalate immediately if symptoms include shortness of breath, diaphoresis, or radiating pain.
Answer: Immediate escalation is required when chest pain includes shortness of breath, diaphoresis, or radiating pain.
If retrieval returns irrelevant text, fix chunking, metadata filters, or index dimension mismatch before touching prompts.
Real-World Use Cases
- •
Clinical policy assistant
- •Answer staff questions about triage rules, prior authorization requirements, and internal care protocols from indexed policy docs.
- •
Patient support agent
- •Retrieve approved discharge instructions or symptom guidance and generate consistent responses without hallucinating medical advice.
- •
Claims and utilization review helper
- •Let agents search plan documents and summarize coverage criteria before escalating to human reviewers.
The production pattern is simple: keep sensitive source material in Pinecone, use OpenAI to reason over retrieved context, and never let generation run without grounding. That’s how you build an AI agent system that is actually usable in healthcare environments.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit