pgvector vs Milvus for real-time apps: Which Should You Use?

By Cyprian AaronsUpdated 2026-04-21
pgvectormilvusreal-time-apps

pgvector is a Postgres extension for vector search. Milvus is a dedicated vector database built for high-scale ANN retrieval. For real-time apps, start with pgvector unless you already know your latency, throughput, or scale will outgrow Postgres.

Quick Comparison

CategorypgvectorMilvus
Learning curveLow if you already know PostgreSQL, SQL, and migrationsHigher; you need to learn a separate service, collection model, and indexing config
PerformanceStrong for small to medium workloads, especially when paired with good Postgres tuning and HNSW/IVFFlatBetter at large-scale similarity search and high QPS retrieval
EcosystemBest-in-class if your app already lives in Postgres; easy joins with transactional dataStrong vector-native ecosystem, but it sits outside your primary OLTP database
PricingUsually cheaper to start because it reuses existing Postgres infraHigher operational cost once you factor in cluster management and storage overhead
Best use casesRAG over moderate corpora, personalization, fraud features near transactional data, MVPsLarge-scale semantic search, multi-tenant retrieval at high volume, billion-vector-style workloads
DocumentationSimple, direct, SQL-first docs and examples using CREATE EXTENSION vector and ORDER BY embedding <-> $1More moving parts: collections, partitions, indexes like HNSW/IVF_FLAT, and query APIs such as search()

When pgvector Wins

If your app already uses PostgreSQL as the system of record, pgvector is the obvious choice. You can store embeddings next to customer records, tickets, policies, or claims and query them with normal SQL.

A typical pattern looks like this:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id bigserial PRIMARY KEY,
  tenant_id uuid NOT NULL,
  content text NOT NULL,
  embedding vector(1536) NOT NULL
);

CREATE INDEX documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops);

Then retrieve nearest neighbors with a plain SQL query:

SELECT id, content
FROM documents
WHERE tenant_id = $1
ORDER BY embedding <=> $2
LIMIT 10;

Use pgvector when:

  • You need strong transactional consistency between embeddings and business data.
  • You want simple deployment with one database instead of a separate vector service.
  • Your workload is real-time but not massive: tens of thousands to low millions of vectors.
  • You need hybrid filtering that is trivial in SQL:
    • WHERE tenant_id = ...
    • AND status = 'active'
    • ORDER BY embedding <-> ...

pgvector also wins when developer velocity matters more than theoretical peak throughput. If your team knows PostgreSQL admin basics but has never run a distributed vector cluster, pgvector gets you shipping faster.

When Milvus Wins

Milvus wins when vector search is the product surface, not just a feature. If retrieval latency and scale are the main problem you’re solving, a dedicated engine beats bolting vectors onto Postgres.

Milvus gives you proper vector-native primitives: collections, partitions, indexes like HNSW and IVF_FLAT, scalar filtering fields, and bulk-friendly ingestion. The API shape makes sense when embeddings are the core data model rather than one column in an OLTP table.

A basic Milvus flow looks like this:

from pymilvus import connections, Collection

connections.connect(alias="default", host="localhost", port="19530")
collection = Collection("customer_docs")
collection.load()

results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    output_fields=["doc_id", "content"]
)

Use Milvus when:

  • You need high QPS similarity search with predictable latency.
  • Your corpus is large enough that Postgres starts becoming expensive or awkward.
  • You expect heavy write + read concurrency on embeddings.
  • You want partitioning and index tuning that are designed around ANN search from day one.

Milvus is also the better choice if your team already runs distributed infrastructure comfortably. At that point the extra operational surface area buys you headroom that pgvector cannot match.

For real-time apps Specifically

For real-time apps, I recommend pgvector first. Most real-time systems are not actually “massive vector platforms”; they are transactional apps that need fast nearest-neighbor lookup alongside normal relational queries.

Use Milvus only when you have hard evidence that Postgres cannot hold your latency target under load or your vector volume is pushing past what a single Postgres-backed architecture should carry. Otherwise you’re paying operational complexity tax for capacity you probably do not need yet.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides