pgvector vs Milvus for RAG: Which Should You Use?
pgvector is a PostgreSQL extension for vector search. Milvus is a purpose-built vector database. If you’re building RAG and your app already lives in Postgres, start with pgvector; if you expect serious scale, high query volume, or multi-tenant retrieval infrastructure, use Milvus.
Quick Comparison
| Area | pgvector | Milvus |
|---|---|---|
| Learning curve | Low if you already know SQL and Postgres | Higher; you need to learn Milvus concepts and SDK patterns |
| Performance | Strong for small to medium workloads, especially with ivfflat and hnsw indexes | Built for high-throughput ANN search at scale |
| Ecosystem | Best fit for existing Postgres apps, migrations, transactions, joins | Strong vector-native ecosystem, built around retrieval workloads |
| Pricing | Cheap to start; one database handles metadata + vectors | More moving parts; higher ops cost unless managed |
| Best use cases | RAG in existing Postgres-backed apps, prototypes that need to ship fast | Large-scale RAG, multi-tenant search, high-QPS retrieval services |
| Documentation | Simple and familiar if you know PostgreSQL docs and SQL syntax like CREATE EXTENSION vector | Good API docs and examples, but more platform-specific concepts to absorb |
When pgvector Wins
Use pgvector when the vector layer should not become a separate system. If your app already uses Postgres for users, documents, permissions, and audit trails, adding vector keeps the whole retrieval stack in one place.
Specific cases where pgvector is the right call:
- •
You need transactional consistency between embeddings and metadata.
- •Example: insert a document chunk and its embedding in the same transaction.
- •That matters when stale or partially written records are unacceptable.
- •
Your retrieval workload is moderate.
- •A few hundred thousand chunks or even low millions is fine if you index correctly.
- •Use
HNSWfor better recall/latency tradeoffs orIVFFlatwhen you want simpler tuning.
- •
Your team already knows SQL and Postgres operations.
- •You can query with plain SQL:
SELECT id, content FROM chunks ORDER BY embedding <=> '[0.12, 0.98, ...]' LIMIT 5; - •That means less new infrastructure and fewer moving parts in production.
- •You can query with plain SQL:
- •
You want metadata filtering to stay native.
- •RAG almost always needs filters like tenant ID, document type, language, or access control.
- •In pgvector, that’s just normal SQL with
WHERE, joins, and indexes.
pgvector also wins when time-to-production matters more than raw scale. If you need a working RAG system next week, not a distributed retrieval platform next quarter, keep it in Postgres.
When Milvus Wins
Use Milvus when retrieval is the product-level concern instead of just one feature inside an app. It is designed for vector search first, which shows up immediately once your corpus grows and traffic starts hitting the retriever hard.
Specific cases where Milvus is the better choice:
- •
You expect large-scale ANN search.
- •Millions to billions of vectors is Milvus territory.
- •It is built around vector indexing and distributed query execution rather than bolting vectors onto a relational engine.
- •
You need high read throughput across many users or tenants.
- •If your RAG service sits behind APIs with heavy concurrent traffic, Milvus gives you room to grow without turning your primary OLTP database into a search engine.
- •
You want vector-native features without fighting SQL ergonomics.
- •Milvus supports collection-oriented workflows through its SDKs.
- •In Python you work with
Collection, schema definitions,insert,search, and index creation explicitly around vectors.
- •
You are building a dedicated retrieval layer.
- •Example: enterprise search over contracts, policies, tickets, knowledge bases.
- •In that setup, separating operational data from retrieval data is cleaner architecture.
Milvus also makes sense when your team already accepts extra infrastructure for better isolation. If retrieval failures must not affect your transactional database, splitting the systems is the correct move.
For RAG Specifically
My recommendation: start with pgvector unless you already know your RAG workload will be big enough to hurt Postgres. For most internal assistants, support bots, policy lookup tools, and document Q&A systems, pgvector gives you enough performance with far less operational overhead.
Choose Milvus only when RAG becomes a real retrieval platform: large corpus, heavy concurrency, aggressive latency targets. If this is a bank or insurer with strict data boundaries and growing search demand across many teams or tenants, Milvus is the safer long-term architecture.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit