How to Integrate Anthropic for healthcare with pgvector for production AI

By Cyprian AaronsUpdated 2026-04-21
anthropic-for-healthcarepgvectorproduction-ai

Connecting Anthropic for healthcare with pgvector gives you a practical pattern for retrieval-augmented clinical workflows: keep sensitive patient context in your own vector store, then use Anthropic to reason over only the relevant chunks. That combination is useful for chart summarization, prior-auth assistance, symptom triage, and policy-aware clinical support where you need both strong language understanding and controlled retrieval.

Prerequisites

  • Python 3.10+
  • An Anthropic API key with access to the healthcare-capable model you plan to use
  • PostgreSQL 14+ with the pgvector extension installed
  • A working Postgres user/database with permissions to create tables and extensions
  • pip packages:
    • anthropic
    • psycopg[binary]
    • pgvector
    • python-dotenv
  • A document set to index:
    • clinical notes
    • care guidelines
    • payer policy docs
    • internal SOPs

Install the dependencies:

pip install anthropic psycopg[binary] pgvector python-dotenv

Integration Steps

1) Set up PostgreSQL with pgvector

Create the extension and a table that stores embeddings alongside your source text. Use cosine distance for semantic search.

import os
import psycopg
from pgvector.psycopg import register_vector

DB_URL = os.environ["DATABASE_URL"]

with psycopg.connect(DB_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
        cur.execute("""
            CREATE TABLE IF NOT EXISTS clinical_chunks (
                id SERIAL PRIMARY KEY,
                patient_id TEXT NOT NULL,
                source TEXT NOT NULL,
                chunk ტექXT NOT NULL,
                embedding VECTOR(1536)
            );
        """)
        cur.execute("""
            CREATE INDEX IF NOT EXISTS clinical_chunks_embedding_idx
            ON clinical_chunks USING ivfflat (embedding vector_cosine_ops)
            WITH (lists = 100);
        """)
    conn.commit()

2) Generate embeddings with Anthropic-compatible text processing

For production, separate embedding generation from generation-time reasoning. If your Anthropic healthcare workflow uses extracted text from notes or documents, chunk it first and send each chunk through your embedding pipeline.

Below is a clean pattern using an embedding provider for vectors and Anthropic for downstream reasoning. If your Anthropic deployment exposes an embeddings endpoint in your environment, swap the client call into the same interface.

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def chunk_text(text: str, size: int = 800):
    return [text[i:i+size] for i in range(0, len(text), size)]

def get_embedding(text: str):
    # Replace with your approved embedding endpoint/provider.
    # Keep this function isolated so the rest of the app does not care.
    raise NotImplementedError("Wire in your production embedding model here")

note = """
Patient reports worsening shortness of breath on exertion.
History of CHF, HTN, diabetes. Recent weight gain of 4 lbs in 3 days.
"""

chunks = chunk_text(note)
embeddings = [get_embedding(chunk) for chunk in chunks]

3) Store chunks and vectors in pgvector

Insert each chunk into Postgres. Keep metadata tight: patient ID, document source, timestamp, and any access-control fields you need.

import os
import psycopg
from pgvector.psycopg import register_vector

DB_URL = os.environ["DATABASE_URL"]

rows = [
    ("patient_123", "admission_note", "Patient reports worsening shortness of breath on exertion.", embeddings[0]),
    ("patient_123", "admission_note", "History of CHF, HTN, diabetes. Recent weight gain of 4 lbs in 3 days.", embeddings[1]),
]

with psycopg.connect(DB_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        cur.executemany(
            """
            INSERT INTO clinical_chunks (patient_id, source, chunk_text, embedding)
            VALUES (%s, %s, %s, %s)
            """,
            rows,
        )
    conn.commit()

4) Retrieve relevant context with pgvector and pass it to Anthropic

This is the core integration. Search by similarity first, then give Anthropic only the top matches plus instructions that constrain output style and scope.

import os
import psycopg
from anthropic import Anthropic
from pgvector.psycopg import register_vector

DB_URL = os.environ["DATABASE_URL"]
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def retrieve_context(query_embedding, limit=3):
    with psycopg.connect(DB_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT patient_id, source, chunk_text
                FROM clinical_chunks
                ORDER BY embedding <=> %s::vector
                LIMIT %s;
                """,
                (query_embedding.tolist(), limit),
            )
            return cur.fetchall()

query = "What is driving the patient's dyspnea?"
query_embedding = get_embedding(query)
matches = retrieve_context(query_embedding)

context_block = "\n\n".join(
    f"[{patient_id} | {source}] {chunk_text}"
    for patient_id, source, chunk_text in matches
)

response = client.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=300,
    temperature=0,
    system="You are a healthcare assistant. Use only the provided context. Do not invent facts.",
    messages=[
        {
            "role": "user",
            "content": f"Context:\n{context_block}\n\nQuestion: {query}"
        }
    ],
)

print(response.content[0].text)

5) Add guardrails before production traffic

Do not send raw PHI unless your compliance posture allows it. Redact identifiers where possible, log retrieval IDs instead of full note text, and keep tenant or patient filters in every query.

def safe_retrieve(patient_id: str, query_embedding):
    with psycopg.connect(DB_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT source, chunk_text
                FROM clinical_chunks
                WHERE patient_id = %s
                ORDER BY embedding <=> %s::vector
                LIMIT 5;
                """,
                (patient_id, query_embedding.tolist()),
            )
            return cur.fetchall()

Testing the Integration

Run a smoke test that inserts one known chunk, queries it back semantically, and asks Anthropic to summarize it.

test_query = "What changed in this patient's condition?"
test_embedding = get_embedding(test_query)

results = safe_retrieve("patient_123", test_embedding)
context = "\n".join(f"{source}: {chunk}" for source, chunk in results)

resp = client.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=120,
    temperature=0,
    system="Answer only from retrieved clinical context.",
    messages=[{"role": "user", "content": f"{context}\n\nQuestion: {test_query}"}],
)

print(resp.content[0].text)

Expected output:

The patient’s dyspnea worsened recently. Supporting details include a history of CHF and a 4 lb weight gain over 3 days.

Real-World Use Cases

  • Clinical chart summarization that pulls only relevant note sections from pgvector before asking Anthropic to draft a concise assessment.
  • Prior authorization assistants that retrieve payer policy snippets and compare them against encounter documentation.
  • Nurse triage copilots that combine recent symptom history with protocol documents to generate structured next-step suggestions.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides