How to Integrate OpenAI for healthcare with Pinecone for AI agents

By Cyprian AaronsUpdated 2026-04-21
openai-for-healthcarepineconeai-agents

Combining OpenAI for healthcare with Pinecone gives you a practical pattern for building agent systems that can answer clinical support questions from your own approved knowledge base. The OpenAI side handles reasoning, extraction, and response generation, while Pinecone gives the agent fast semantic retrieval over policies, care pathways, formulary docs, and internal medical content.

This is the difference between a generic chatbot and a production-grade healthcare assistant: the model answers from grounded context instead of guessing.

Prerequisites

  • Python 3.10+
  • An OpenAI API key with access to the healthcare-capable model you plan to use
  • A Pinecone account and API key
  • A Pinecone index created with the right vector dimension for your embedding model
  • pip installed
  • Basic familiarity with Python async or sync APIs
  • A local .env file or secret manager for:
    • OPENAI_API_KEY
    • PINECONE_API_KEY

Install the SDKs:

pip install openai pinecone python-dotenv

Integration Steps

1) Initialize both clients

Use environment variables and keep credentials out of code. For healthcare workloads, treat prompts and retrieved documents as sensitive data and keep access scoped.

import os
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone

load_dotenv()

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index = pc.Index("healthcare-agent-index")

2) Create embeddings for your healthcare content

You need embeddings for policies, care guidelines, discharge instructions, or benefits docs. OpenAI’s embedding API turns text into vectors that Pinecone can store and search.

documents = [
    {
        "id": "doc-001",
        "text": "For chest pain triage, escalate immediately if symptoms include shortness of breath, diaphoresis, or radiating pain."
    },
    {
        "id": "doc-002",
        "text": "Prior authorization is required for MRI imaging unless the patient has documented emergency indications."
    }
]

embed_response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=[doc["text"] for doc in documents]
)

vectors = []
for doc, item in zip(documents, embed_response.data):
    vectors.append({
        "id": doc["id"],
        "values": item.embedding,
        "metadata": {"text": doc["text"]}
    })

If your index does not exist yet, create it with the correct dimension for the embedding model you chose. For text-embedding-3-small, use the dimension returned by the model setup you standardize on.

3) Upsert vectors into Pinecone

Store the embedded content in Pinecone so your agent can retrieve it later by semantic similarity.

index.upsert(vectors=vectors)
print("Upsert complete")

At this point, your knowledge base is searchable. In a real system, you would chunk long clinical documents before embedding them so each chunk stays focused.

4) Retrieve context for a user question

When the agent receives a question, embed that question and query Pinecone for the most relevant passages.

user_question = "When should a patient with chest pain be escalated?"

query_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=user_question
).data[0].embedding

search_results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True
)

context_chunks = []
for match in search_results["matches"]:
    context_chunks.append(match["metadata"]["text"])

context = "\n".join(context_chunks)
print(context)

This is the core retrieval step. Your agent should only answer using retrieved context plus approved system instructions.

5) Generate a grounded answer with OpenAI

Now pass the retrieved context into the chat/completions API and force the model to answer from that context. If you are using a healthcare-specific deployment or policy-controlled endpoint in your environment, keep the same pattern: retrieve first, then generate.

messages = [
    {
        "role": "system",
        "content": (
            "You are a healthcare support assistant. "
            "Answer only from the provided context. "
            "If the context is insufficient, say you do not have enough information."
        )
    },
    {
        "role": "user",
        "content": f"Context:\n{context}\n\nQuestion: {user_question}"
    }
]

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.2
)

answer = response.choices[0].message.content
print(answer)

That gives you a clean RAG flow: embed → store → retrieve → answer.

Testing the Integration

Run an end-to-end check with one known document and one question that should hit it.

test_question = "What symptoms require immediate escalation for chest pain?"

q_emb = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=test_question
).data[0].embedding

result = index.query(vector=q_emb, top_k=1, include_metadata=True)
retrieved_text = result["matches"][0]["metadata"]["text"]

final_resp = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer only from context."},
        {"role": "user", "content": f"Context:\n{retrieved_text}\n\nQuestion: {test_question}"}
    ]
)

print("Retrieved:", retrieved_text)
print("Answer:", final_resp.choices[0].message.content)

Expected output:

Retrieved: For chest pain triage, escalate immediately if symptoms include shortness of breath, diaphoresis, or radiating pain.
Answer: Immediate escalation is required when chest pain includes shortness of breath, diaphoresis, or radiating pain.

If retrieval returns irrelevant text, fix chunking, metadata filters, or index dimension mismatch before touching prompts.

Real-World Use Cases

  • Clinical policy assistant

    • Answer staff questions about triage rules, prior authorization requirements, and internal care protocols from indexed policy docs.
  • Patient support agent

    • Retrieve approved discharge instructions or symptom guidance and generate consistent responses without hallucinating medical advice.
  • Claims and utilization review helper

    • Let agents search plan documents and summarize coverage criteria before escalating to human reviewers.

The production pattern is simple: keep sensitive source material in Pinecone, use OpenAI to reason over retrieved context, and never let generation run without grounding. That’s how you build an AI agent system that is actually usable in healthcare environments.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides