pgvector vs Qdrant in 2026: Which Vector Database Should You Pick?
The 2026 default for new projects is pgvector if you already run Postgres, Qdrant if you need scale. Here's how to choose without overthinking it.

Every RAG or semantic-search project needs somewhere to store embeddings, and in 2026 the choice for most teams comes down to two names: pgvector and Qdrant. Pinecone, Weaviate, Milvus, and LanceDB all have real strengths, but for a new project the practical decision is usually between "add vectors to the Postgres I already run" and "stand up a dedicated vector engine built for scale and filtering." Both have matured a lot this year, pgvector's 0.8 line fixed its biggest filtering weakness, and Qdrant has pushed quantization and sharding hard. Here is how to decide without overthinking it.
Quick answer
If you already run Postgres and have fewer than roughly 10 million vectors, use pgvector (on version 0.8 or newer): it adds zero new infrastructure, keeps vectors next to your relational data, and the 0.8 iterative index scans closed most of the old filtered-search gap. Reach for Qdrant once you cross about 10M vectors, need horizontal scaling, or do heavy filtered search at scale (legal, finance, multi-tenant), where its Rust engine and payload-aware HNSW hold latency that pgvector cannot. This is a scale-and-operations call, not a "which is better" call.
Key takeaways
- Already on Postgres with under ~10M vectors? Use pgvector. Zero new infrastructure, and your vectors live next to your relational data.
- Past ~10M vectors, heavy filtered search, or horizontal scaling? Use Qdrant. Its Rust engine and payload-aware HNSW hold latency under load.
- In 2026 benchmarks Qdrant is roughly 2-3x faster on p50 latency at 1M vectors, and filtered queries barely move its latency, while pgvector's can degrade sharply.
- pgvector 0.8 added iterative index scans, which largely fixed the old "filter returns too few rows" problem, the gap is narrower than it was a year ago.
- This is a scale-and-operations decision, not a "which is better" decision. Match the tool to where you are now.
pgvector: the pragmatic choice
pgvector is a PostgreSQL extension that adds a vector type and similarity search to the database you probably already operate. Its appeal is leverage: no new system to learn, no separate service to deploy, no extra backup, monitoring, or on-call story. Your embeddings sit next to your application data, you get transactional consistency for free, and you query with SQL.
That last point is underrated. Combining vector search with ordinary relational filters in a single query is natural in pgvector because it is just SQL:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigserial PRIMARY KEY,
tenant_id int NOT NULL,
title text,
created_at timestamptz DEFAULT now(),
embedding vector(1536)
);
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Similarity search with a relational filter in one query
SELECT id, title
FROM documents
WHERE tenant_id = 42
AND created_at > '2026-01-01'
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
The <=> operator is cosine distance; pgvector also ships <-> (L2) and <#> (inner product). For tuning, hnsw.ef_search trades recall against speed at query time.

Where pgvector's ceiling appears
pgvector's HNSW index stays fast and accurate up to roughly 10M vectors. Beyond that, into the hundreds of millions, it starts to trail purpose-built engines on both latency and recall, and it carries Postgres overhead, 2026 benchmarks put its memory use around 40% higher than Qdrant for the same vector count because of heap tuples and page padding. At 1M vectors (768-d) independent tests showed pgvector around 11ms p50 versus Qdrant's ~4ms.
Historically pgvector's worst weakness was filtered search: with an approximate index, the filter was applied after the scan, so a selective WHERE clause could return far fewer rows than your LIMIT. pgvector 0.8 fixed the common case with iterative index scans (hnsw.iterative_scan), which keep pulling candidates until the filter is satisfied or a threshold is hit. AWS reported up to 9x faster queries and dramatically better recall on filtered workloads after the upgrade. If you are on an older pgvector, upgrading is the single highest-leverage change you can make.
Tip
Do not migrate away from Postgres prematurely. If you have it and fewer than 10M vectors, pgvector requires zero new infrastructure and SQL is genuinely powerful. Premature adoption of a dedicated vector DB is a common source of needless operational complexity.
Qdrant: built for scale and filtered search
Qdrant is a dedicated vector database written in Rust, with SIMD-accelerated distance math and an HNSW implementation that integrates payload (metadata) indexes directly into graph traversal. You reach for it when you need real production throughput, expect to cross 10M vectors, or want horizontal scaling a single Postgres instance cannot provide.
Its standout strength is filtered search. Because filtering happens inside the index traversal rather than as a post-filter, stacking metadata constraints barely costs anything, 2026 benchmarks measured roughly 1-2ms of filter overhead on Qdrant versus filtered pgvector queries degrading toward ~25ms on 500K vectors when heap scans broke index locality. That is why Qdrant is so often recommended for legal, financial, and multi-tenant applications where you must constrain by permission, tenant, date, or category at scale and still get accurate nearest neighbors.
A minimal Python client looks like this:
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert(
collection_name="documents",
points=[PointStruct(id=1, vector=[...], payload={"tenant_id": 42})],
)
# Vector search with a payload filter applied during traversal
hits = client.query_points(
collection_name="documents",
query=[...],
query_filter=Filter(must=[FieldCondition(key="tenant_id", match=MatchValue(value=42))]),
limit=10,
).points
Qdrant's other 2026 headline is quantization. Scalar and binary quantization can cut RAM up to ~97% (binary compresses float32 vectors roughly 32x), letting you keep huge collections in memory and tune the speed-versus-precision trade-off per collection. Combined with sharding and replication for zero-downtime resizing, it is built for the deployments where an embedded or single-node approach starts to feel constrained.
The cost is operational: it is another system to deploy, secure, monitor, and back up, with its own learning curve and a second failure mode to reason about.
Warning
"Dedicated vector DB" is not automatically "better." For a 2M-vector app already running on Postgres, adding Qdrant means a second datastore to keep in sync, a second on-call surface, and a second source of staleness between your relational data and your vectors, for performance you did not need yet. Match the tool to the scale you actually have.
pgvector vs Qdrant at a glance
Here is the head-to-head on the dimensions that actually decide the choice:
| Dimension | pgvector 0.8+ | Qdrant |
|---|---|---|
| Best up to | ~10M vectors | Hundreds of millions |
| p50 latency at 1M vectors | ~11ms | ~4ms |
| Filtered search overhead | Improved, can still degrade | ~1-2ms, payload-aware |
| Memory use (same vectors) | ~40% higher | Lower, plus quantization |
| Horizontal scaling | Limited to Postgres | Sharding and replication |
| New infrastructure | None (it is Postgres) | A second datastore to run |
| Best for | Postgres shops, joined data | Scale, filtered search at volume |
A decision framework
Start from your current state, not your aspirations:
- Already run Postgres with under ~10M vectors? pgvector. No migration, SQL filtering, done, just make sure you are on 0.8+.
- Need vectors tightly joined to relational application data? pgvector. One query, one transaction, one source of truth.
- Past ~10M vectors, or scaling horizontally under sustained production load? Qdrant.
- Is fast filtered search at scale a core requirement (legal, finance, multi-tenant)? Qdrant, even below 10M vectors if filtering dominates your latency.
How you chunk and structure documents before they ever hit the database matters as much as the engine, see our guide to RAG chunking strategies. And if you are building agents on top of this retrieval layer, agent memory is where storage choices show up most. To keep retrieval quality honest as you tune ef_search or quantization, wire up LLM-as-a-judge evals so you measure recall changes instead of guessing.
The takeaway
For most new projects in 2026 the honest answer is pgvector until it hurts. If you already run Postgres, have fewer than 10M vectors, and are on 0.8+, it is the lowest-friction option, and the iterative-scan fix closed most of the old filtering gap. Move to Qdrant when scale, sustained throughput, or filtered search at volume genuinely demand a dedicated engine, its filtering and quantization story is hard to beat there. Start with what fits your team's current state, and migrate only when the metrics, not the hype, tell you to.
Frequently asked questions
Is Qdrant always faster than pgvector?
At small to mid scale on simple queries they are close, both return in single-digit to low-tens of milliseconds. Qdrant's edge widens with scale, raw throughput, and especially filtered search, where 2026 benchmarks show it holding ~1-2ms of filter overhead while pgvector can degrade. Below ~1M vectors with light filtering, the difference rarely justifies a second system.
Did pgvector 0.8 really fix filtered search?
It fixed the most painful symptom. Iterative index scans (hnsw.iterative_scan) keep fetching candidates until your filter is satisfied, so a selective WHERE no longer silently returns fewer rows than your LIMIT. Heavily filtered queries can still cost more than Qdrant's payload-aware traversal, but for typical workloads the gap is now small.
Can I start on pgvector and migrate to Qdrant later?
Yes, and it is a common path. Keep your embedding pipeline and metadata schema portable, store the source text alongside vectors, and you can re-embed or bulk-load into Qdrant when you cross the scale threshold. The migration is mostly a data-loading and query-rewrite exercise, not a re-architecture.
How do model and runtime choices affect the database decision?
Embedding dimensionality and how aggressively you compress prompts drive memory and index cost, the broader tokenmaxxing shift toward leaner context applies to vectors too. And if you self-host embeddings, your retrieval latency budget depends on your serving stack; see local inference engines for that side of the pipeline.
Sources & further reading
- open-techstack.com/blog/pgvector-vs-qdrant-2026/
- instaclustr.com/education/vector-database/pgvector-vs-qdrant-5-key-differences-and-how-to-choose/
- 4xxi.com/articles/vector-database-comparison/
- callsphere.ai/blog/vector-database-benchmarks-2026-pgvector-qdrant-weaviate-milvus-lancedb
- postgresql.org/about/news/pgvector-080-released-2952/
- aws.amazon.com/blogs/database/supercharging-vector-search-performance-and-relevance-with-pgvector-0-8-0-on-amazon-aurora-postgresql/
- github.com/qdrant/qdrant


