How to Integrate FastAPI for healthcare with PostgreSQL for RAG

By Cyprian AaronsUpdated 2026-04-21

fastapi-for-healthcarepostgresqlrag

FastAPI for healthcare gives you the HTTP layer for clinical workflows, while PostgreSQL gives you durable storage for patient records, embeddings, and retrieval metadata. Put them together and you get a clean RAG backend: ingest medical documents, store vectors and structured fields in Postgres, and expose retrieval endpoints through FastAPI for healthcare.

This is the pattern you want when your AI agent needs to answer questions from policy docs, triage notes, discharge summaries, or claims files without stuffing everything into memory.

Prerequisites

•Python 3.10+
•FastAPI for healthcare installed and configured in your project
•PostgreSQL 14+ running locally or in your cloud environment
•A PostgreSQL user with permissions to create tables and extensions
•pgvector enabled in PostgreSQL if you want vector search for RAG
•psycopg or asyncpg installed for database access
•uvicorn for running the FastAPI app
•An embedding model provider or local embedding function
•Basic knowledge of async Python

Install the core packages:

pip install fastapi uvicorn psycopg[binary] pgvector sqlalchemy

Integration Steps

1. Set up PostgreSQL for document and vector storage

Start by creating a table that stores both structured metadata and embeddings. For RAG, you need stable IDs, source references, chunk text, and a vector column.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS medical_chunks (
    id UUID PRIMARY KEY,
    patient_id TEXT NOT NULL,
    source_type TEXT NOT NULL,
    source_ref TEXT NOT NULL,
    chunk_text TEXT NOT NULL,
    embedding VECTOR(1536),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

If you are using pgvector, this table becomes your retrieval index. Keep metadata fields explicit so you can filter by patient, document type, or encounter.

2. Build the FastAPI healthcare app and database connection

Use FastAPI for healthcare as the API surface. In practice, this is just a FastAPI app with healthcare-specific routes and validation around clinical data.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import psycopg
from pgvector.psycopg import register_vector

app = FastAPI(title="Healthcare RAG API")

DATABASE_URL = "postgresql://rag_user:rag_pass@localhost:5432/rag_db"

class ChunkIn(BaseModel):
    patient_id: str
    source_type: str
    source_ref: str
    chunk_text: str

def get_conn():
    conn = psycopg.connect(DATABASE_URL)
    register_vector(conn)
    return conn

@app.get("/health")
def health():
    return {"status": "ok"}

This gives you a single place to manage connection creation. In production, move to a pool rather than opening a raw connection per request.

3. Ingest clinical text into PostgreSQL with embeddings

For RAG, ingestion means splitting documents into chunks, generating embeddings, and storing them in Postgres. The example below uses a placeholder embedding function so the integration stays focused on the plumbing.

from uuid import uuid4

def embed_text(text: str) -> list[float]:
    # Replace with your real embedding call.
    return [0.01] * 1536

@app.post("/chunks")
def create_chunk(payload: ChunkIn):
    embedding = embed_text(payload.chunk_text)

    query = """
        INSERT INTO medical_chunks (id, patient_id, source_type, source_ref, chunk_text, embedding)
        VALUES (%s, %s, %s, %s, %s, %s)
        RETURNING id;
    """

    try:
        with get_conn() as conn:
            with conn.cursor() as cur:
                cur.execute(
                    query,
                    (
                        uuid4(),
                        payload.patient_id,
                        payload.source_type,
                        payload.source_ref,
                        payload.chunk_text,
                        embedding,
                    ),
                )
                chunk_id = cur.fetchone()[0]
                conn.commit()
        return {"chunk_id": str(chunk_id), "status": "stored"}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

That route is the ingestion entry point for your agent system. Your upstream pipeline can send discharge summaries, lab interpretations, or claims notes into /chunks.

4. Add retrieval for RAG using PostgreSQL vector search

Now expose a query endpoint that finds similar chunks by cosine distance. This is the core of retrieval-augmented generation.

class QueryIn(BaseModel):
    question: str
    patient_id: str | None = None
    top_k: int = 5

@app.post("/retrieve")
def retrieve(payload: QueryIn):
    q_embedding = embed_text(payload.question)

    sql = """
        SELECT id, patient_id, source_type, source_ref, chunk_text,
               1 - (embedding <=> %s::vector) AS similarity
        FROM medical_chunks
        WHERE (%s IS NULL OR patient_id = %s)
        ORDER BY embedding <=> %s::vector
        LIMIT %s;
    """

    try:
        with get_conn() as conn:
            with conn.cursor() as cur:
                cur.execute(
                    sql,
                    (q_embedding, payload.patient_id, payload.patient_id, q_embedding, payload.top_k),
                )
                rows = cur.fetchall()

        results = [
            {
                "id": str(r[0]),
                "patient_id": r[1],
                "source_type": r[2],
                "source_ref": r[3],
                "chunk_text": r[4],
                "similarity": float(r[5]),
            }
            for r in rows
        ]
        return {"matches": results}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This endpoint is what your agent calls before generating an answer. You can pass the retrieved chunks into your LLM prompt along with strict clinical guardrails.

5. Run the service and wire it into your agent workflow

Start the API with Uvicorn and call it from your orchestration layer or another service.

# main.py
from fastapi import FastAPI

app = FastAPI()

# include routes here if split across files

# run:
# uvicorn main:app --reload --port 8000

If you are building an agent loop outside the API process:

import requests

resp = requests.post(
    "http://localhost:8000/retrieve",
    json={"question": "What medications were prescribed at discharge?", "patient_id": "P123", "top_k": 3},
)

print(resp.json())

That gives you a clean separation:

•FastAPI handles request validation and auth later on.
•PostgreSQL stores durable clinical knowledge.
•Your agent retrieves context only when needed.

Testing the Integration

Use one insert followed by one retrieval call. If both succeed, your RAG path is working end to end.

import requests

base_url = "http://localhost:8000"

insert_resp = requests.post(
    f"{base_url}/chunks",
    json={
        "patient_id": "P123",
        "source_type": "discharge_summary",
        "source_ref": "encounter_456",
        "chunk_text": "Patient discharged on metformin 500mg daily and advised follow-up in two weeks.",
    },
)

query_resp = requests.post(
    f"{base_url}/retrieve",
    json={
        "question": "What medication was prescribed?",
        "patient_id": "P123",
        "top_k": 1,
    },
)

print("Insert:", insert_resp.status_code, insert_resp.json())
print("Retrieve:", query_resp.status_code)
print(query_resp.json())

Expected output:

Insert: 200 {'chunk_id': '...', 'status': 'stored'}
Retrieve: 200
{'matches': [{'id': '...', 'patient_id': 'P123', 'source_type': 'discharge_summary', 'source_ref': 'encounter_456', 'chunk_text': 'Patient discharged on metformin 500mg daily and advised follow-up in two weeks.', 'similarity': 0.9...}]}

Real-World Use Cases

•Clinical copilot that answers questions from discharge summaries, lab reports, and medication histories using retrieved chart context.
•Prior authorization assistant that searches policy documents and claim notes before drafting a decision summary.
•Patient support agent that pulls approved educational content based on diagnosis codes and recent encounter history.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit