How to Integrate Azure OpenAI for healthcare with CosmosDB for RAG

By Cyprian AaronsUpdated 2026-04-21
azure-openai-for-healthcarecosmosdbrag

Combining Azure OpenAI for healthcare with Cosmos DB gives you a practical RAG stack for clinical and operational workloads. You get a model layer that can answer grounded questions from approved medical content, plus a durable vector store for policies, care pathways, discharge instructions, and internal knowledge.

For healthcare agents, this matters because you need retrieval, traceability, and data isolation. Cosmos DB handles the document lifecycle and similarity search; Azure OpenAI handles summarization, extraction, and answer generation from retrieved context.

Prerequisites

  • An Azure subscription with:
    • Azure OpenAI resource provisioned
    • Azure Cosmos DB for NoSQL account provisioned
  • An Azure OpenAI deployment name for:
    • chat/completions model
    • embeddings model
  • A Cosmos DB database and container created
  • Python 3.10+
  • Installed packages:
    • openai
    • azure-cosmos
    • python-dotenv
  • Environment variables set:
    • AZURE_OPENAI_ENDPOINT
    • AZURE_OPENAI_API_KEY
    • AZURE_OPENAI_API_VERSION
    • AZURE_OPENAI_CHAT_DEPLOYMENT
    • AZURE_OPENAI_EMBEDDING_DEPLOYMENT
    • COSMOS_ENDPOINT
    • COSMOS_KEY
    • COSMOS_DATABASE_NAME
    • COSMOS_CONTAINER_NAME

Integration Steps

1) Install the SDKs and load configuration

Start by wiring your app to both services. Keep secrets in environment variables; don’t hardcode them in notebooks or agent code.

pip install openai azure-cosmos python-dotenv
import os
from dotenv import load_dotenv

load_dotenv()

AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
AZURE_OPENAI_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]
AZURE_OPENAI_API_VERSION = os.environ["AZURE_OPENAI_API_VERSION"]
CHAT_DEPLOYMENT = os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"]
EMBEDDING_DEPLOYMENT = os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"]

COSMOS_ENDPOINT = os.environ["COSMOS_ENDPOINT"]
COSMOS_KEY = os.environ["COSMOS_KEY"]
COSMOS_DATABASE_NAME = os.environ["COSMOS_DATABASE_NAME"]
COSMOS_CONTAINER_NAME = os.environ["COSMOS_CONTAINER_NAME"]

2) Create clients for Azure OpenAI and Cosmos DB

Use the official SDKs. For Azure OpenAI, the AzureOpenAI client gives you .embeddings.create() and .chat.completions.create(). For Cosmos DB, use CosmosClient to create or access your database and container.

from openai import AzureOpenAI
from azure.cosmos import CosmosClient, PartitionKey

aoai_client = AzureOpenAI(
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
)

cosmos_client = CosmosClient(COSMOS_ENDPOINT, credential=COSMOS_KEY)
database = cosmos_client.get_database_client(COSMOS_DATABASE_NAME)
container = database.get_container_client(COSMOS_CONTAINER_NAME)

If you are provisioning from code during development, create the container with a partition key that matches your document grouping strategy.

database = cosmos_client.create_database_if_not_exists(COSMOS_DATABASE_NAME)
container = database.create_container_if_not_exists(
    id=COSMOS_CONTAINER_NAME,
    partition_key=PartitionKey(path="/tenantId"),
    offer_throughput=400,
)

3) Generate embeddings for healthcare documents

For RAG, every chunk needs an embedding before it goes into Cosmos DB. Use the embeddings deployment in Azure OpenAI and store the vector alongside the source text and metadata.

def embed_text(text: str) -> list[float]:
    response = aoai_client.embeddings.create(
        model=EMBEDDING_DEPLOYMENT,
        input=text,
    )
    return response.data[0].embedding


clinical_chunk = {
    "id": "policy-001",
    "tenantId": "hospital-a",
    "docType": "discharge-policy",
    "title": "Adult Discharge Instructions",
    "content": "Patients should receive medication reconciliation, follow-up instructions, and red-flag symptoms before discharge.",
}

clinical_chunk["embedding"] = embed_text(clinical_chunk["content"])

4) Store chunks in Cosmos DB with vector-ready metadata

Persist each chunk as a JSON document. In production, include fields for source control, timestamps, access scope, and clinical review status.

from datetime import datetime

clinical_chunk.update({
    "createdAt": datetime.utcnow().isoformat(),
    "reviewStatus": "approved",
})

container.upsert_item(clinical_chunk)

If you have multiple documents, batch them through a simple ingestion loop.

documents = [
    {
        "id": "policy-002",
        "tenantId": "hospital-a",
        "docType": "triage-guideline",
        "title": "Triage Escalation Rules",
        "content": "Escalate chest pain with shortness of breath immediately.",
    },
]

for doc in documents:
    doc["embedding"] = embed_text(doc["content"])
    doc["createdAt"] = datetime.utcnow().isoformat()
    doc["reviewStatus"] = "approved"
    container.upsert_item(doc)

5) Retrieve context from Cosmos DB and call Azure OpenAI

For RAG, embed the user query first. Then retrieve matching records from Cosmos DB using vector search if your container is configured for it. If your account is using vector indexing on NoSQL API, use the query pattern supported by your setup. After retrieval, pass the top chunks into Azure OpenAI chat completions.

def answer_question(question: str):
    question_embedding = embed_text(question)

    # Example query shape; adjust to your Cosmos vector index configuration.
    query = """
    SELECT TOP 3 c.id, c.title, c.content
    FROM c
    ORDER BY VectorDistance(c.embedding, @embedding)
    """

    params = [{"name": "@embedding", "value": question_embedding}]

    matches = list(container.query_items(
        query=query,
        parameters=params,
        enable_cross_partition_query=True,
    ))

    context_block = "\n\n".join(
        f"Title: {m['title']}\nContent: {m['content']}"
        for m in matches
    )

    response = aoai_client.chat.completions.create(
        model=CHAT_DEPLOYMENT,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a healthcare assistant. Answer only from provided context. "
                    "If the context is insufficient, say so."
                ),
            },
            {
                "role": "user",
                "content": f"Question: {question}\n\nContext:\n{context_block}",
            },
        ],
        temperature=0.2,
    )

    return response.choices[0].message.content


print(answer_question("What should be included before adult discharge?"))

Testing the Integration

Run a quick smoke test against both services: store one known policy chunk, retrieve it through your query path, then generate an answer.

test_question = "What instructions should patients receive before discharge?"
result = answer_question(test_question)
print(result)

Expected output:

Patients should receive medication reconciliation, follow-up instructions,
and red-flag symptom guidance before discharge.

If you get an empty or generic answer:

  • confirm embeddings are being written to Cosmos DB
  • confirm your container supports vector queries/indexing
  • verify the deployment names match what you created in Azure OpenAI
  • check that your prompt forces grounded answers only

Real-World Use Cases

  • Clinical policy assistant
    Let staff ask questions about approved hospital policies, triage rules, or discharge procedures with answers grounded in internal documents.

  • Prior authorization support
    Retrieve payer rules and clinical criteria from Cosmos DB, then have Azure OpenAI draft structured responses or missing-document checklists.

  • Patient support agent
    Build an agent that answers common post-visit questions using vetted educational content while keeping responses tied to reviewed medical guidance.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides