pgvector vs Cassandra for RAG: Which Should You Use?
pgvector is a vector search extension for PostgreSQL. Cassandra is a distributed wide-column database that can store vectors, but it was built for scale-out writes and availability first, not retrieval quality. For RAG, use pgvector unless you have a hard requirement for multi-region write-heavy operational data at massive scale.
Quick Comparison
| Category | pgvector | Cassandra |
|---|---|---|
| Learning curve | Low if you already know Postgres. You use CREATE EXTENSION vector, CREATE INDEX, and SQL. | Higher. You need to understand data modeling by partition key, clustering columns, and query constraints. |
| Performance | Strong for small to mid-sized RAG workloads, especially with HNSW and IVFFlat indexes. | Strong at high write throughput and horizontal distribution, but vector retrieval is not its native strength. |
| Ecosystem | Excellent. Works with Postgres tools, transactions, joins, backups, and ORM support. | Good for distributed storage use cases, but less natural for ad hoc retrieval workflows. |
| Pricing | Usually cheaper to operate because it rides on existing Postgres infra. | Can get expensive fast when you overprovision nodes for latency and replication needs. |
| Best use cases | Document chunk search, metadata filtering, hybrid SQL + vector retrieval, prototype to production RAG. | Event-heavy systems, large-scale operational data, multi-DC writes, embedding storage tied to application records. |
| Documentation | Clear and practical via PostgreSQL docs and the pgvector project docs. | Solid Apache docs, but vector-specific guidance is thinner and more operationally complex. |
When pgvector Wins
Use pgvector when your RAG system needs tight integration between embeddings and metadata filters.
A common pattern is:
- •Store chunks in Postgres
- •Add embeddings with
vector(1536)or whatever dimension your model uses - •Filter by tenant, document type, language, or ACL in the same SQL query
- •Rank by cosine distance using
<=>
That matters because RAG rarely wants “just vectors.” It wants vectors plus business rules.
Use pgvector when you want production-grade retrieval without adding another datastore.
Postgres gives you:
- •ACID transactions
- •Joins against users, documents, permissions, and audit tables
- •Mature backup/restore
- •Familiar observability
With pgvector, you keep the whole retrieval pipeline in one place instead of splitting metadata in Postgres and vectors somewhere else.
Use pgvector when your team already runs PostgreSQL.
This is the biggest practical win. Your developers already know migrations, connection pooling, EXPLAIN ANALYZE, indexes, replication slots, and role-based access control. Adding one extension is much cheaper than introducing a new distributed database just to store embeddings.
Use pgvector when your dataset is in the millions of chunks range, not billions.
That’s where it shines:
- •Internal knowledge bases
- •Customer support RAG
- •Legal or policy search
- •Product documentation assistants
For these workloads, an HNSW index on embedding plus a B-tree index on metadata fields is enough.
Example schema:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE chunks (
id bigserial PRIMARY KEY,
doc_id bigint NOT NULL,
tenant_id bigint NOT NULL,
content text NOT NULL,
embedding vector(1536) NOT NULL,
created_at timestamptz DEFAULT now()
);
CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON chunks (tenant_id, doc_id);
When Cassandra Wins
Use Cassandra when your primary problem is distributed write throughput across many nodes or regions.
Cassandra is built for:
- •High ingest rates
- •Linear scale-out
- •Multi-datacenter replication
- •Always-on availability
If your application is already storing operational records in Cassandra and you want embeddings next to those records, it makes sense to keep them there.
Use Cassandra when you need predictable writes at very large scale.
RAG pipelines that continuously ingest:
- •Clickstream events
- •IoT telemetry
- •Chat transcripts at massive volume
- •Product activity streams
can fit Cassandra well if embeddings are attached to those records as part of the write path.
Use Cassandra when your access pattern is simple and pre-modeled.
Cassandra works best when you know the exact query shape up front:
- •Partition by tenant or conversation ID
- •Cluster by timestamp or document order
- •Fetch candidate rows quickly
If your retrieval logic is “get recent items for this partition,” Cassandra is excellent. If your retrieval logic needs flexible filtering plus semantic ranking across arbitrary slices of data, it becomes awkward fast.
Use Cassandra when operational resilience matters more than rich query flexibility.
Cassandra handles node failures well and keeps serving traffic during topology changes. That makes sense for systems where ingestion never stops and temporary inconsistency is acceptable.
Example table design:
CREATE TABLE rag_chunks (
tenant_id text,
doc_id text,
chunk_id timeuuid,
content text,
embedding list<float>,
PRIMARY KEY ((tenant_id), doc_id, chunk_id)
) WITH CLUSTERING ORDER BY (doc_id ASC, chunk_id ASC);
The catch: this is storage-first design. It does not give you the same retrieval ergonomics as SQL + vector search.
For RAG Specifically
Pick pgvector.
RAG needs semantic similarity plus metadata filtering plus iterative debugging of retrieval quality. Postgres gives you all three in one engine with familiar SQL semantics; Cassandra gives you scale-out storage but makes retrieval logic harder than it should be.
If you are building a serious RAG system for enterprise data access control, document chunking, reranking experiments, and fast iteration on prompts and filters, pgvector is the correct default.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit