How to Integrate Anthropic for healthcare with pgvector for RAG
Healthcare RAG is only useful when the model can answer from your approved clinical corpus, not from memory. Pairing Anthropic for healthcare with pgvector gives you a clean pattern: store embeddings from policies, care pathways, discharge notes, or claims docs in Postgres, retrieve the most relevant chunks, then send them to Anthropic for a grounded response.
This is the setup you want for triage assistants, prior-auth helpers, clinical policy lookup, and internal support bots that need citations and controlled outputs.
Prerequisites
- •Python 3.10+
- •PostgreSQL 14+ with the
pgvectorextension installed - •An Anthropic account with API access for your healthcare use case
- •
ANTHROPIC_API_KEYset in your environment - •A Postgres database URL in
DATABASE_URL - •Python packages:
- •
anthropic - •
psycopg[binary] - •
pgvector - •
sentence-transformersor another embedding provider
- •
- •A document set to index:
- •clinical guidelines
- •payer policy docs
- •care management SOPs
- •member-facing FAQ content
Integration Steps
1) Install dependencies and enable pgvector
Start by installing the Python packages and enabling the extension in Postgres.
pip install anthropic psycopg[binary] pgvector sentence-transformers
import os
import psycopg
DATABASE_URL = os.environ["DATABASE_URL"]
with psycopg.connect(DATABASE_URL) as conn:
with conn.cursor() as cur:
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
conn.commit()
If you are using a managed Postgres service, make sure the extension is allowed on that instance. Without this step, vector similarity search will fail.
2) Create a table for chunks and embeddings
Store text chunks plus their embeddings in one table. Use a fixed embedding dimension that matches your embedding model.
import psycopg
from pgvector.psycopg import register_vector
DATABASE_URL = "postgresql://user:pass@localhost:5432/healthcare_rag"
with psycopg.connect(DATABASE_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS rag_chunks (
id BIGSERIAL PRIMARY KEY,
source TEXT NOT NULL,
chunk ტექსტ TEXT NOT NULL,
embedding vector(384) NOT NULL
);
""")
cur.execute("""
CREATE INDEX IF NOT EXISTS rag_chunks_embedding_idx
ON rag_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
""")
conn.commit()
Use vector_cosine_ops if your retrieval logic is cosine similarity based. In production, keep chunk sizes consistent and store metadata like document version, specialty, and effective date.
3) Generate embeddings and insert documents into pgvector
For embeddings, use a local model or your existing embedding service. The example below uses sentence-transformers so the pipeline stays self-contained.
from sentence_transformers import SentenceTransformer
import psycopg
from pgvector.psycopg import register_vector
model = SentenceTransformer("all-MiniLM-L6-v2")
docs = [
{
"source": "policy_orthopedics_2025.pdf",
"chunk": "MRI prior authorization requires documented conservative therapy for six weeks unless red flags are present."
},
{
"source": "care_pathway_diabetes.md",
"chunk": "For uncontrolled diabetes, escalate to care management when HbA1c remains above target after two follow-up contacts."
}
]
with psycopg.connect(DATABASE_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
for doc in docs:
embedding = model.encode(doc["chunk"]).tolist()
cur.execute(
"INSERT INTO rag_chunks (source, chunk, embedding) VALUES (%s, %s, %s)",
(doc["source"], doc["chunk"], embedding)
)
conn.commit()
This is the ingestion path. In a real system, run this through a document pipeline that extracts text, chunks by section boundaries, deduplicates content, and stores provenance fields.
4) Retrieve top-k context from pgvector
At query time, embed the user question and pull the nearest chunks from Postgres.
import psycopg
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
question = "What are the requirements for MRI prior authorization?"
query_embedding = model.encode(question).tolist()
with psycopg.connect(DATABASE_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
cur.execute(
"""
SELECT source, chunk
FROM rag_chunks
ORDER BY embedding <=> %s
LIMIT 3;
""",
(query_embedding,)
)
rows = cur.fetchall()
context = "\n\n".join([f"[{source}] {chunk}" for source, chunk in rows])
print(context)
The <=> operator returns cosine distance when paired with vector_cosine_ops. This gives you deterministic retrieval before anything touches Anthropic.
5) Send retrieved context to Anthropic for grounded generation
Now pass the retrieved evidence into Anthropic’s Messages API. Keep the prompt strict: answer only from provided context and cite sources.
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
system_prompt = """
You are a healthcare assistant.
Answer only using the provided context.
If the context is insufficient, say you do not have enough information.
Cite sources inline using bracketed source names.
"""
user_prompt = f"""
Question: {question}
Context:
{context}
"""
response = client.messages.create(
model="claude-3-5-sonnet-latest",
max_tokens=400,
temperature=0,
system=system_prompt,
messages=[
{"role": "user", "content": user_prompt}
],
)
print(response.content[0].text)
For healthcare workflows, keep temperature at zero unless you have a specific reason not to. You want stable answers that track policy text closely.
Testing the Integration
Run an end-to-end test with one known question and one known answer source. The goal is to verify retrieval returns relevant chunks and Anthropic produces an answer grounded in those chunks.
def ask(question: str):
q_emb = model.encode(question).tolist()
with psycopg.connect(DATABASE_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
cur.execute(
"""
SELECT source, chunk
FROM rag_chunks
ORDER BY embedding <=> %s
LIMIT 2;
""",
(q_emb,)
)
rows = cur.fetchall()
ctx = "\n".join([f"[{s}] {c}" for s, c in rows])
resp = client.messages.create(
model="claude-3-5-sonnet-latest",
max_tokens=200,
temperature=0,
system="Answer only from context.",
messages=[{"role": "user", "content": f"Question: {question}\n\nContext:\n{ctx}"}],
)
return rows, resp.content[0].text
rows, answer = ask("When do we require MRI prior authorization?")
print("Retrieved:", rows)
print("Answer:", answer)
Expected output:
Retrieved: [
('policy_orthopedics_2025.pdf', 'MRI prior authorization requires documented conservative therapy for six weeks unless red flags are present.')
]
Answer: MRI prior authorization requires documented conservative therapy for six weeks unless red flags are present. [policy_orthopedics_2025.pdf]
If retrieval looks wrong before generation does, fix chunking or embeddings first. If retrieval is correct but the answer hallucinates beyond context, tighten the system prompt and lower max tokens.
Real-World Use Cases
- •Prior authorization copilots that summarize policy requirements from approved payer documents before a human reviewer signs off.
- •Care management assistants that surface relevant pathways from internal protocols and generate next-step recommendations with citations.
- •Member support agents that answer coverage questions using plan documents stored in Postgres instead of free-form model memory.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit