pgvector vs Elasticsearch for insurance: Which Should You Use?
pgvector and Elasticsearch solve different problems, even though both can store vectors and run similarity search. pgvector is a Postgres extension for teams that want vector search inside their transactional database; Elasticsearch is a search engine built for retrieval, filtering, scoring, and operational search at scale.
For insurance, use pgvector first if your workload is embedded in claims, policy, or customer data already living in Postgres. Use Elasticsearch only when search is a first-class product requirement with heavy text retrieval, faceting, and high-volume indexing.
Quick Comparison
| Area | pgvector | Elasticsearch |
|---|---|---|
| Learning curve | Low if you already know PostgreSQL; use CREATE EXTENSION vector, vector, halfvec, sparsevec, ivfflat, hnsw | Higher; you need to understand indices, mappings, analyzers, shards, replicas, and relevance tuning |
| Performance | Strong for moderate-scale vector search inside Postgres; excellent when combined with SQL filters and joins | Strong for large-scale retrieval and hybrid search; built for distributed indexing and query throughput |
| Ecosystem | Native Postgres ecosystem: transactions, joins, RLS, backups, ORM support | Mature search ecosystem: full-text search, aggregations, ingest pipelines, Kibana |
| Pricing | Usually cheaper if you already run Postgres; one system to operate | More expensive operationally; separate cluster or managed service plus indexing overhead |
| Best use cases | RAG over policy/claims data, deduping documents, semantic lookup with SQL filters | Enterprise search portals, log-like document search, hybrid text + vector retrieval at scale |
| Documentation | Straightforward extension docs and Postgres examples | Broad but more complex; lots of knobs and tuning guidance |
When pgvector Wins
- •
Your source of truth is already Postgres If claims, policies, customer records, and document metadata are in PostgreSQL, keep the vector index there. You get one transaction boundary, one backup strategy, one access model.
- •
You need strict SQL filtering with semantic search Insurance queries are rarely “just vector similarity.” They are usually “find similar claim notes for this line of business in the last 18 months for this state.” pgvector lets you combine
ORDER BY embedding <=> $1with normal SQL filters cleanly. - •
You want simpler ops and fewer moving parts pgvector adds an extension to an existing database. That means no separate cluster to size, no shard planning, no analyzer tuning just to get started.
- •
You care about transactional consistency If a claim gets updated and its embedding needs to stay in sync with the row that owns it, Postgres is the right place. You can update the row and embedding together instead of managing eventual consistency across systems.
Example pattern
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE claim_notes (
id bigserial PRIMARY KEY,
claim_id bigint NOT NULL,
state text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
note ტექxt NOT NULL,
embedding vector(1536)
);
CREATE INDEX ON claim_notes USING hnsw (embedding vector_cosine_ops);
SELECT id, claim_id
FROM claim_notes
WHERE state = 'CA'
AND created_at >= now() - interval '18 months'
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
That is the right shape for insurance. The semantic ranking stays close to the business filters.
When Elasticsearch Wins
- •
You need real enterprise search If users expect keyword search across policy PDFs, endorsements, emails, notes, and attachments with relevance tuning, Elasticsearch is better. Its inverted index model is still stronger than Postgres for classic text retrieval.
- •
You need faceting and aggregations Insurance teams ask questions like “show me claims by carrier, loss type, region, adjuster team, and status.” Elasticsearch’s aggregations are built for this. pgvector is not a replacement for a proper analytics/search engine.
- •
You have huge document volumes Once you are indexing millions of documents with frequent updates and multiple query patterns — keyword search plus semantic ranking plus filters — Elasticsearch’s distributed architecture starts earning its keep.
- •
You want hybrid retrieval as a primary feature Elasticsearch supports dense vectors via
dense_vector, approximate kNN search with HNSW-style indexing depending on version/configuration path, plus BM25-style lexical scoring. That makes it strong when exact terms matter alongside semantic similarity.
Example pattern
PUT claims
{
"mappings": {
"properties": {
"note": { "type": "text" },
"state": { "type": "keyword" },
"created_at": { "type": "date" },
"embedding": {
"type": "dense_vector",
"dims": 1536,
"index": true,
"similarity": "cosine"
}
}
}
}
Then you can combine structured filters with text relevance and vector scoring in one retrieval layer. That matters when the user experience depends on ranked search results rather than database-style lookup.
For insurance Specifically
My recommendation: start with pgvector unless your product is explicitly a search product. Most insurance AI workloads are RAG over internal records — claims notes, underwriting guidelines, policy language — where SQL filters matter more than fancy search infrastructure.
Use Elasticsearch only if your team needs heavyweight document search across large corpora with faceting, analytics-style aggregation queries, and multi-field relevance tuning. For core insurance systems of record plus AI retrieval on top, pgvector is the cleaner default and the cheaper mistake-proof choice.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit