pgvector vs Milvus for real-time apps: Which Should You Use?

By Cyprian AaronsUpdated 2026-04-21

pgvectormilvusreal-time-apps

pgvector is a Postgres extension for vector search. Milvus is a dedicated vector database built for high-scale ANN retrieval. For real-time apps, start with pgvector unless you already know your latency, throughput, or scale will outgrow Postgres.

Quick Comparison

Category	pgvector	Milvus
Learning curve	Low if you already know PostgreSQL, SQL, and migrations	Higher; you need to learn a separate service, collection model, and indexing config
Performance	Strong for small to medium workloads, especially when paired with good Postgres tuning and HNSW/IVFFlat	Better at large-scale similarity search and high QPS retrieval
Ecosystem	Best-in-class if your app already lives in Postgres; easy joins with transactional data	Strong vector-native ecosystem, but it sits outside your primary OLTP database
Pricing	Usually cheaper to start because it reuses existing Postgres infra	Higher operational cost once you factor in cluster management and storage overhead
Best use cases	RAG over moderate corpora, personalization, fraud features near transactional data, MVPs	Large-scale semantic search, multi-tenant retrieval at high volume, billion-vector-style workloads
Documentation	Simple, direct, SQL-first docs and examples using `CREATE EXTENSION vector` and `ORDER BY embedding <-> $1`	More moving parts: collections, partitions, indexes like HNSW/IVF_FLAT, and query APIs such as `search()`

When pgvector Wins

If your app already uses PostgreSQL as the system of record, pgvector is the obvious choice. You can store embeddings next to customer records, tickets, policies, or claims and query them with normal SQL.

A typical pattern looks like this:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id bigserial PRIMARY KEY,
  tenant_id uuid NOT NULL,
  content text NOT NULL,
  embedding vector(1536) NOT NULL
);

CREATE INDEX documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops);

Then retrieve nearest neighbors with a plain SQL query:

SELECT id, content
FROM documents
WHERE tenant_id = $1
ORDER BY embedding <=> $2
LIMIT 10;

Use pgvector when:

•You need strong transactional consistency between embeddings and business data.
•You want simple deployment with one database instead of a separate vector service.
•Your workload is real-time but not massive: tens of thousands to low millions of vectors.
•
You need hybrid filtering that is trivial in SQL:
- •WHERE tenant_id = ...
- •AND status = 'active'
- •ORDER BY embedding <-> ...

pgvector also wins when developer velocity matters more than theoretical peak throughput. If your team knows PostgreSQL admin basics but has never run a distributed vector cluster, pgvector gets you shipping faster.

When Milvus Wins

Milvus wins when vector search is the product surface, not just a feature. If retrieval latency and scale are the main problem you’re solving, a dedicated engine beats bolting vectors onto Postgres.

Milvus gives you proper vector-native primitives: collections, partitions, indexes like HNSW and IVF_FLAT, scalar filtering fields, and bulk-friendly ingestion. The API shape makes sense when embeddings are the core data model rather than one column in an OLTP table.

A basic Milvus flow looks like this:

from pymilvus import connections, Collection

connections.connect(alias="default", host="localhost", port="19530")
collection = Collection("customer_docs")
collection.load()

results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    output_fields=["doc_id", "content"]
)

Use Milvus when:

•You need high QPS similarity search with predictable latency.
•Your corpus is large enough that Postgres starts becoming expensive or awkward.
•You expect heavy write + read concurrency on embeddings.
•You want partitioning and index tuning that are designed around ANN search from day one.

Milvus is also the better choice if your team already runs distributed infrastructure comfortably. At that point the extra operational surface area buys you headroom that pgvector cannot match.

For real-time apps Specifically

For real-time apps, I recommend pgvector first. Most real-time systems are not actually “massive vector platforms”; they are transactional apps that need fast nearest-neighbor lookup alongside normal relational queries.

Use Milvus only when you have hard evidence that Postgres cannot hold your latency target under load or your vector volume is pushing past what a single Postgres-backed architecture should carry. Otherwise you’re paying operational complexity tax for capacity you probably do not need yet.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit