How to Integrate OpenAI for healthcare with Pinecone for RAG

By Cyprian AaronsUpdated 2026-04-21
openai-for-healthcarepineconerag

Combining OpenAI for healthcare with Pinecone gives you a practical RAG stack for clinical and operational workflows. The pattern is simple: store approved medical content, policies, or care pathways in Pinecone, then use OpenAI to answer questions grounded in that retrieved context instead of relying on model memory alone.

This is the setup you want when accuracy, traceability, and controlled retrieval matter. It works well for patient-support assistants, internal clinical knowledge search, and document-heavy workflows where the agent needs to cite the right source before generating a response.

Prerequisites

  • Python 3.10+
  • An OpenAI API key with access to the healthcare-capable model you plan to use
  • A Pinecone account and API key
  • A Pinecone index created with the correct vector dimension for your embedding model
  • pip installed
  • Basic familiarity with embeddings and retrieval-augmented generation
  • Environment variables configured:
    • OPENAI_API_KEY
    • PINECONE_API_KEY

Install the SDKs:

pip install openai pinecone python-dotenv

Integration Steps

1) Initialize OpenAI and Pinecone clients

Start by loading credentials and creating both clients. Keep this in one place so your agent layer can reuse them.

import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone

load_dotenv()

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index_name = "healthcare-rag"
index = pc.Index(index_name)

If you are running this in production, keep secrets out of code and rotate keys through your secret manager.

2) Create embeddings for your healthcare documents

For RAG, your documents need embeddings before they can be stored in Pinecone. Use an embedding model from OpenAI and keep each chunk small enough to retrieve cleanly.

documents = [
    {
        "id": "doc-001",
        "text": "Hypertension management includes lifestyle changes, sodium reduction, exercise, and medication when indicated."
    },
    {
        "id": "doc-002",
        "text": "Type 2 diabetes care often includes HbA1c monitoring, diet changes, activity, and medication adherence."
    }
]

texts = [d["text"] for d in documents]

embedding_response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

vectors = []
for doc, emb in zip(documents, embedding_response.data):
    vectors.append({
        "id": doc["id"],
        "values": emb.embedding,
        "metadata": {"text": doc["text"]}
    })

Use metadata aggressively. In healthcare systems, you usually want source type, version, specialty, approval status, and last-reviewed date attached to every chunk.

3) Upsert vectors into Pinecone

Now push those vectors into your index. This makes them retrievable by semantic similarity later.

upsert_response = index.upsert(vectors=vectors)

print(upsert_response)

A practical pattern is to namespace by tenant or content domain.

index.upsert(
    vectors=vectors,
    namespace="clinical-guidelines"
)

That keeps retrieval isolated between departments, customers, or care programs.

4) Retrieve relevant context for a user query

When a user asks a question, embed the query and search Pinecone for the most relevant chunks.

query = "How should we manage high blood pressure in a patient?"

query_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=[query]
).data[0].embedding

search_results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True,
    namespace="clinical-guidelines"
)

contexts = [
    match["metadata"]["text"]
    for match in search_results["matches"]
]

At this point you have the retrieval side of RAG. The key is not returning raw search results directly to users; pass them into the model as grounded context.

5) Generate a grounded answer with OpenAI

Use the retrieved passages as context in a chat completion call. Keep the prompt strict so the model answers only from supplied sources.

context_block = "\n\n".join([f"- {c}" for c in contexts])

messages = [
    {
        "role": "system",
        "content": (
            "You are a healthcare assistant. Answer only using the provided context. "
            "If the context is insufficient, say so clearly."
        )
    },
    {
        "role": "user",
        "content": f"Context:\n{context_block}\n\nQuestion: {query}"
    }
]

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.2
)

print(response.choices[0].message.content)

For regulated workflows, add guardrails:

  • refuse diagnosis claims unless your workflow explicitly supports them
  • include citations from metadata
  • log retrieved document IDs for auditability

Testing the Integration

Run an end-to-end test that inserts one known record, queries it back, and checks whether the answer uses that record.

test_query = "What are common first-line lifestyle changes for hypertension?"

q_emb = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=[test_query]
).data[0].embedding

result = index.query(
    vector=q_emb,
    top_k=1,
    include_metadata=True,
    namespace="clinical-guidelines"
)

retrieved_text = result["matches"][0]["metadata"]["text"]

final_answer = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer using only provided context."},
        {"role": "user", "content": f"Context: {retrieved_text}\n\nQuestion: {test_query}"}
    ],
).choices[0].message.content

print("Retrieved:", retrieved_text)
print("Answer:", final_answer)

Expected output:

Retrieved: Hypertension management includes lifestyle changes, sodium reduction, exercise, and medication when indicated.
Answer: Common first-line lifestyle changes include sodium reduction and regular exercise.

If you get an unrelated answer:

  • check embedding dimensions match your Pinecone index configuration
  • verify you upserted into the same namespace you query from
  • confirm your prompt restricts generation to retrieved context only

Real-World Use Cases

  • Clinical knowledge assistant for staff
    Retrieve approved treatment guidance, triage protocols, or medication references before generating responses.

  • Patient support agent
    Answer questions about appointment prep, post-discharge instructions, or insurance-facing care navigation using approved content only.

  • Internal policy search
    Let operations teams query benefits rules, prior authorization steps, or compliance docs with grounded answers and source tracking.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides