How to Integrate Anthropic for healthcare with pgvector for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21

anthropic-for-healthcarepgvectormulti-agent-systems

Combining Anthropic for healthcare with pgvector gives you a practical pattern for clinical-grade multi-agent systems: one agent can reason over patient context, while another retrieves the most relevant prior notes, care plans, or policy snippets from vector search. That matters when you need grounded responses, traceable retrieval, and shared memory across agents without stuffing everything into the prompt.

Prerequisites

•Python 3.10+
•A running PostgreSQL instance
•pgvector installed in PostgreSQL
•An Anthropic API key with access to the relevant healthcare-capable model
•A Postgres user with permissions to create extensions and tables
•Basic familiarity with embeddings and multi-agent orchestration

Install the Python packages:

pip install anthropic psycopg[binary] pgvector python-dotenv

Enable the extension in Postgres:

CREATE EXTENSION IF NOT EXISTS vector;

Integration Steps

•
Create a shared vector store for clinical memory

In a multi-agent system, each agent should not keep its own isolated context. Store note chunks, care summaries, and policy text in Postgres so every agent can retrieve the same source of truth.

import os
import psycopg
from pgvector.psycopg import register_vector

DB_URL = os.environ["DATABASE_URL"]

with psycopg.connect(DB_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        cur.execute("""
            CREATE TABLE IF NOT EXISTS clinical_memory (
                id SERIAL PRIMARY KEY,
                patient_id TEXT NOT NULL,
                source TEXT NOT NULL,
                content TEXT NOT NULL,
                embedding VECTOR(1536)
            );
        """)
        conn.commit()

•
Generate embeddings with Anthropic-compatible workflow and store them

Anthropic’s API is used here for the reasoning model layer. For embeddings, use your chosen embedding provider and keep the interface clean so the rest of the system stays model-agnostic. In production, this separation is what lets you swap models without rewriting retrieval.

import os
import anthropic
import psycopg
from pgvector.psycopg import register_vector

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
DB_URL = os.environ["DATABASE_URL"]

def embed_text(text: str) -> list[float]:
    # Replace with your embedding provider call.
    # Keep output dimension aligned with your VECTOR column.
    return [0.01] * 1536

clinical_chunks = [
    {
        "patient_id": "p-1001",
        "source": "discharge_summary",
        "content": "Patient discharged on lisinopril 10mg daily. Follow-up in 2 weeks."
    },
    {
        "patient_id": "p-1001",
        "source": "triage_note",
        "content": "Reports mild shortness of breath on exertion. No chest pain."
    }
]

with psycopg.connect(DB_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        for chunk in clinical_chunks:
            emb = embed_text(chunk["content"])
            cur.execute(
                """
                INSERT INTO clinical_memory (patient_id, source, content, embedding)
                VALUES (%s, %s, %s, %s)
                """,
                (chunk["patient_id"], chunk["source"], chunk["content"], emb),
            )
        conn.commit()

•
Retrieve relevant context from pgvector before calling Anthropic

The retriever agent queries similar notes first. Then the reasoning agent gets only the top matches, which keeps prompts small and makes outputs more defensible.

import os
import psycopg
from pgvector.psycopg import register_vector

DB_URL = os.environ["DATABASE_URL"]

def embed_text(text: str) -> list[float]:
    return [0.01] * 1536

def retrieve_context(patient_id: str, query: str, k: int = 3):
    query_embedding = embed_text(query)

    with psycopg.connect(DB_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT source, content
                FROM clinical_memory
                WHERE patient_id = %s
                ORDER BY embedding <-> %s::vector
                LIMIT %s;
                """,
                (patient_id, query_embedding, k),
            )
            return cur.fetchall()

•
Call Anthropic with retrieved context to produce a grounded answer

Use client.messages.create(...) to generate a response that cites only what was retrieved. For healthcare workflows, keep the prompt constrained and ask for structured output.

import os
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def generate_clinical_response(patient_id: str, query: str) -> str:
    context_rows = retrieve_context(patient_id, query)

    context_block = "\n".join(
        [f"[{source}] {content}" for source, content in context_rows]
    )

    message = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=400,
        temperature=0,
        messages=[
            {
                "role": "user",
                "content": f"""
You are a clinical assistant working from retrieved patient context only.

Patient ID: {patient_id}
Question: {query}

Retrieved context:
{context_block}

Return:
- likely interpretation
- missing data needed before action
- concise next step recommendation
"""
            }
        ],
    )

    return message.content[0].text

print(generate_clinical_response("p-1001", "What follow-up should be scheduled?"))

•
Wire it into a multi-agent flow

A practical setup is a router agent plus specialist agents. The router decides whether to retrieve from pgvector, then the clinician-facing agent uses Anthropic to synthesize an answer.

def route_request(query: str) -> str:
    if any(term in query.lower() for term in ["follow-up", "medication", "symptom", "note"]):
        return "clinical_retrieval"
    return "general_reasoning"

def handle_request(patient_id: str, query: str) -> str:
    route = route_request(query)

    if route == "clinical_retrieval":
        return generate_clinical_response(patient_id, query)

    response = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=200,
        temperature=0,
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

Testing the Integration

Run a simple end-to-end check against one patient record:

result = handle_request("p-1001", "What follow-up should be scheduled after discharge?")
print(result)

Expected output:

likely interpretation:
The patient needs a 2-week outpatient follow-up after discharge.

missing data needed before action:
No appointment date or specialty is listed in the retrieved notes.

concise next step recommendation:
Schedule primary care or discharge follow-up within 2 weeks and confirm medication adherence.

Real-World Use Cases

•
Clinical chart summarization
- •One agent retrieves prior notes from pgvector.
- •Another agent uses Anthropic to summarize changes across encounters.
•
Care-gap detection
- •A retrieval agent pulls recent labs, discharge notes, and medication lists.
- •A reasoning agent flags missing follow-ups or unresolved symptoms.
•
Policy-aware triage assistants
- •Store internal triage protocols in pgvector.
- •Use Anthropic to answer staff questions while grounding responses in approved policy text.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit