How to Integrate FastAPI for healthcare with PostgreSQL for RAG
FastAPI for healthcare gives you the HTTP layer for clinical workflows, while PostgreSQL gives you durable storage for patient records, embeddings, and retrieval metadata. Put them together and you get a clean RAG backend: ingest medical documents, store vectors and structured fields in Postgres, and expose retrieval endpoints through FastAPI for healthcare.
This is the pattern you want when your AI agent needs to answer questions from policy docs, triage notes, discharge summaries, or claims files without stuffing everything into memory.
Prerequisites
- •Python 3.10+
- •FastAPI for healthcare installed and configured in your project
- •PostgreSQL 14+ running locally or in your cloud environment
- •A PostgreSQL user with permissions to create tables and extensions
- •
pgvectorenabled in PostgreSQL if you want vector search for RAG - •
psycopgorasyncpginstalled for database access - •
uvicornfor running the FastAPI app - •An embedding model provider or local embedding function
- •Basic knowledge of async Python
Install the core packages:
pip install fastapi uvicorn psycopg[binary] pgvector sqlalchemy
Integration Steps
1. Set up PostgreSQL for document and vector storage
Start by creating a table that stores both structured metadata and embeddings. For RAG, you need stable IDs, source references, chunk text, and a vector column.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS medical_chunks (
id UUID PRIMARY KEY,
patient_id TEXT NOT NULL,
source_type TEXT NOT NULL,
source_ref TEXT NOT NULL,
chunk_text TEXT NOT NULL,
embedding VECTOR(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
If you are using pgvector, this table becomes your retrieval index. Keep metadata fields explicit so you can filter by patient, document type, or encounter.
2. Build the FastAPI healthcare app and database connection
Use FastAPI for healthcare as the API surface. In practice, this is just a FastAPI app with healthcare-specific routes and validation around clinical data.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import psycopg
from pgvector.psycopg import register_vector
app = FastAPI(title="Healthcare RAG API")
DATABASE_URL = "postgresql://rag_user:rag_pass@localhost:5432/rag_db"
class ChunkIn(BaseModel):
patient_id: str
source_type: str
source_ref: str
chunk_text: str
def get_conn():
conn = psycopg.connect(DATABASE_URL)
register_vector(conn)
return conn
@app.get("/health")
def health():
return {"status": "ok"}
This gives you a single place to manage connection creation. In production, move to a pool rather than opening a raw connection per request.
3. Ingest clinical text into PostgreSQL with embeddings
For RAG, ingestion means splitting documents into chunks, generating embeddings, and storing them in Postgres. The example below uses a placeholder embedding function so the integration stays focused on the plumbing.
from uuid import uuid4
def embed_text(text: str) -> list[float]:
# Replace with your real embedding call.
return [0.01] * 1536
@app.post("/chunks")
def create_chunk(payload: ChunkIn):
embedding = embed_text(payload.chunk_text)
query = """
INSERT INTO medical_chunks (id, patient_id, source_type, source_ref, chunk_text, embedding)
VALUES (%s, %s, %s, %s, %s, %s)
RETURNING id;
"""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(
query,
(
uuid4(),
payload.patient_id,
payload.source_type,
payload.source_ref,
payload.chunk_text,
embedding,
),
)
chunk_id = cur.fetchone()[0]
conn.commit()
return {"chunk_id": str(chunk_id), "status": "stored"}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
That route is the ingestion entry point for your agent system. Your upstream pipeline can send discharge summaries, lab interpretations, or claims notes into /chunks.
4. Add retrieval for RAG using PostgreSQL vector search
Now expose a query endpoint that finds similar chunks by cosine distance. This is the core of retrieval-augmented generation.
class QueryIn(BaseModel):
question: str
patient_id: str | None = None
top_k: int = 5
@app.post("/retrieve")
def retrieve(payload: QueryIn):
q_embedding = embed_text(payload.question)
sql = """
SELECT id, patient_id, source_type, source_ref, chunk_text,
1 - (embedding <=> %s::vector) AS similarity
FROM medical_chunks
WHERE (%s IS NULL OR patient_id = %s)
ORDER BY embedding <=> %s::vector
LIMIT %s;
"""
try:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(
sql,
(q_embedding, payload.patient_id, payload.patient_id, q_embedding, payload.top_k),
)
rows = cur.fetchall()
results = [
{
"id": str(r[0]),
"patient_id": r[1],
"source_type": r[2],
"source_ref": r[3],
"chunk_text": r[4],
"similarity": float(r[5]),
}
for r in rows
]
return {"matches": results}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
This endpoint is what your agent calls before generating an answer. You can pass the retrieved chunks into your LLM prompt along with strict clinical guardrails.
5. Run the service and wire it into your agent workflow
Start the API with Uvicorn and call it from your orchestration layer or another service.
# main.py
from fastapi import FastAPI
app = FastAPI()
# include routes here if split across files
# run:
# uvicorn main:app --reload --port 8000
If you are building an agent loop outside the API process:
import requests
resp = requests.post(
"http://localhost:8000/retrieve",
json={"question": "What medications were prescribed at discharge?", "patient_id": "P123", "top_k": 3},
)
print(resp.json())
That gives you a clean separation:
- •FastAPI handles request validation and auth later on.
- •PostgreSQL stores durable clinical knowledge.
- •Your agent retrieves context only when needed.
Testing the Integration
Use one insert followed by one retrieval call. If both succeed, your RAG path is working end to end.
import requests
base_url = "http://localhost:8000"
insert_resp = requests.post(
f"{base_url}/chunks",
json={
"patient_id": "P123",
"source_type": "discharge_summary",
"source_ref": "encounter_456",
"chunk_text": "Patient discharged on metformin 500mg daily and advised follow-up in two weeks.",
},
)
query_resp = requests.post(
f"{base_url}/retrieve",
json={
"question": "What medication was prescribed?",
"patient_id": "P123",
"top_k": 1,
},
)
print("Insert:", insert_resp.status_code, insert_resp.json())
print("Retrieve:", query_resp.status_code)
print(query_resp.json())
Expected output:
Insert: 200 {'chunk_id': '...', 'status': 'stored'}
Retrieve: 200
{'matches': [{'id': '...', 'patient_id': 'P123', 'source_type': 'discharge_summary', 'source_ref': 'encounter_456', 'chunk_text': 'Patient discharged on metformin 500mg daily and advised follow-up in two weeks.', 'similarity': 0.9...}]}
Real-World Use Cases
- •Clinical copilot that answers questions from discharge summaries, lab reports, and medication histories using retrieved chart context.
- •Prior authorization assistant that searches policy documents and claim notes before drafting a decision summary.
- •Patient support agent that pulls approved educational content based on diagnosis codes and recent encounter history.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit