How to Integrate OpenAI for healthcare with Pinecone for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21

openai-for-healthcarepineconemulti-agent-systems

Combining OpenAI for healthcare with Pinecone gives you a clean pattern for building multi-agent systems that can retrieve patient context, clinical notes, policy docs, and care pathways without stuffing everything into the prompt. The result is a setup where one agent can reason over current conversation state while another agent pulls the right medical context from vector storage in milliseconds.

This is useful when you need grounded answers, case summarization, triage support, or care coordination workflows where different agents own different tasks. OpenAI handles reasoning and generation, while Pinecone handles long-term semantic memory and retrieval at scale.

Prerequisites

•Python 3.10+
•An OpenAI API key with access to the healthcare-capable model you plan to use
•A Pinecone account and API key
•A Pinecone index created with the correct embedding dimension
•pip install openai pinecone
•
A basic multi-agent design in mind:
- •one agent for retrieval
- •one agent for clinical reasoning or response drafting
•
Environment variables set:
- •OPENAI_API_KEY
- •PINECONE_API_KEY

Integration Steps

•

Install dependencies and initialize both clients

Start by wiring up the SDK clients. Keep credentials out of code and load them from environment variables.

import os
from openai import OpenAI
from pinecone import Pinecone

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

•

Create or connect to your Pinecone index

Your index should match the embedding model dimension you use. For example, if you use OpenAI embeddings for chunking clinical notes, store those vectors in Pinecone with metadata like patient_id, document_type, and encounter_date.

index_name = "healthcare-agents"
index = pc.Index(index_name)

# Example record upsert format
vectors = [
    {
        "id": "note-001",
        "values": [0.12, 0.08, 0.91],  # replace with real embedding vector
        "metadata": {
            "patient_id": "p123",
            "document_type": "discharge_summary",
            "text": "Patient discharged on metformin with follow-up in 2 weeks."
        }
    }
]

index.upsert(vectors=vectors)

•

Embed a query and retrieve relevant medical context

In a multi-agent flow, one agent typically handles retrieval. It embeds the user query, queries Pinecone, then passes the top matches to the reasoning agent.

query = "What follow-up care is recommended after discharge for this diabetic patient?"

embedding_response = openai_client.embeddings.create(
    model="text-embedding-3-large",
    input=query
)

query_vector = embedding_response.data[0].embedding

results = index.query(
    vector=query_vector,
    top_k=3,
    include_metadata=True
)

retrieved_context = []
for match in results.matches:
    retrieved_context.append(match.metadata["text"])

print(retrieved_context)

•

Use OpenAI for healthcare to generate a grounded response

Now pass the retrieved context into your generation step. The important part is not asking the model to guess; give it the retrieved notes and instruct it to answer only from that context.

context_block = "\n\n".join(retrieved_context)

messages = [
    {
        "role": "system",
        "content": (
            "You are a healthcare assistant. "
            "Use only the provided clinical context. "
            "If information is missing, say so clearly."
        )
    },
    {
        "role": "user",
        "content": f"""

Clinical context: {context_block}

Question: {query} """ } ]

response = openai_client.chat.completions.create( model="gpt-4o", messages=messages, temperature=0.2 )

print(response.choices[0].message.content)


5. **Wrap retrieval + generation into two cooperating agents**

In production, split responsibilities. One agent retrieves evidence from Pinecone; another agent drafts the response using OpenAI for healthcare. This keeps your system debuggable and makes it easier to add guardrails later.

```python
 def retrieve_agent(question: str):
     emb = openai_client.embeddings.create(
         model="text-embedding-3-large",
         input=question
     )
     qvec = emb.data[0].embedding

     res = index.query(
         vector=qvec,
         top_k=5,
         include_metadata=True
     )

     return [m.metadata["text"] for m in res.matches]

 def reason_agent(question: str, contexts: list[str]):
     prompt = "\n\n".join(contexts)
     completion = openai_client.chat.completions.create(
         model="gpt-4o",
         messages=[
             {
                 "role": "system",
                 "content": (
                     "You are a healthcare reasoning agent. "
                     "Answer strictly from supplied evidence."
                 )
             },
             {
                 "role": "user",
                 "content": f"Evidence:\n{prompt}\n\nQuestion:\n{question}"
             }
         ],
         temperature=0.1
     )
     return completion.choices[0].message.content

 question = "What should be monitored after starting metformin?"
 contexts = retrieve_agent(question)
 answer = reason_agent(question, contexts)

 print(answer)

Testing the Integration

Run a simple end-to-end test with one stored note and one question that should match it.

test_question = "What medication was prescribed at discharge?"

contexts = retrieve_agent(test_question)
answer = reason_agent(test_question, contexts)

print("Retrieved contexts:", contexts)
print("Answer:", answer)

Expected output:

Retrieved contexts: ['Patient discharged on metformin with follow-up in 2 weeks.']
Answer: The discharge medication mentioned in the provided context is metformin.

If retrieval returns irrelevant chunks, fix your embedding strategy or metadata filters first before tuning prompts.

Real-World Use Cases

•
Clinical support assistant
- •Retrieve prior encounter notes from Pinecone.
- •Use OpenAI for healthcare to summarize symptoms, medications, and next steps for clinicians.
•
Care coordination multi-agent workflow
- •One agent fetches patient history.
- •Another checks policy or care-plan documents.
- •A third drafts a follow-up message or task list.
•
Insurance claim triage
- •Store claim documents, denial letters, and policy excerpts in Pinecone.
- •Use OpenAI for healthcare to classify cases and draft reviewer-ready summaries grounded in retrieved evidence.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit