pgvector performance: Benchmark results and 5 ways to boost performance

What Is pgvector?

pgvector is a PostgreSQL extension to store and search vector embeddings. Embeddings are high-dimensional floating-point arrays used to represent text, images, or other data types in machine learning.

By integrating vector storage into PostgreSQL, pgvector allows developers to use their existing relational infrastructure for applications like semantic search, recommendation systems, and AI-driven analytics without needing a separate vector database.

pgvector provides custom data types and similarity search operators for high-dimensional vectors. This allows users to store vectors in database columns and execute similarity queries using familiar SQL syntax. It supports various distance metrics such as L2 (Euclidean), inner product, and cosine similarity, making it flexible for a range of use cases.

The extension’s integration into PostgreSQL ensures that developers can manage both structured and unstructured data in a unified environment, simplifying deployment and maintenance and opening up opportunities for Postgres performance.

The Postgres and pgvector bottleneck

While pgvector allows vector search within PostgreSQL, it inherits several architectural limitations from the underlying database. These limitations become apparent as the scale and complexity of vector workloads grow:

Performance at scale: pgvector supports only two indexing strategies: HNSW and IVF_FLAT. HNSW provides good search accuracy but suffers from long indexing times and high memory usage. IVF_FLAT builds indexes faster but fails to maintain performance as the dataset size increases. Additionally, pgvector lacks support for more scalable index types like DiskANN or GPU-accelerated search, making it difficult to handle large-scale vector workloads efficiently.
Handling of high-dimensional embeddings: pgvector stores data in 8KB pages, which limits the number of dimensions a vector can have. Each float takes 4 bytes, and space is also used for metadata, reducing the number of dimensions that can fit within a page. As a result, indexing high-dimensional vectors is often impractical. Workarounds like quantization exist but come at the cost of reduced precision.
Feature limitations: It supports only a limited set of distance metrics and lacks capabilities like hybrid sparse-dense search, advanced filtering on metadata, and integrated full-text search. Dedicated vector databases offer these features out of the box.
Scalability: PostgreSQL was not designed for distributed workloads, so scaling pgvector typically involves manual sharding and complex index management across nodes. This adds operational overhead and limits the ability to scale with growing data and query demands.

Benchmarking pgvector performance: Results from AWS test

Amazon Web Services (AWS) recently carried out a large-scale test of pgvector across different releases, running on the Aurora PostgreSQL service.

The tests measured five things: recall, storage size (table + index), index build time, p99 latency, and single-connection throughput (QPS), using ANN Benchmarks with small harness tweaks, fixed index build parameters, and default search settings to keep comparisons fair.

Environment: Two 64 vCPU / 512 GiB instances were used: r7gd.16xlarge (local NVMe) and r7i.16xlarge (gp3). PostgreSQL 16.2 was configured for high parallelism (e.g., 64 parallel workers, large shared buffers/maintenance work mem, JIT off). Versions spanned pgvector 0.4.1–0.7.0 across IVFFlat and HNSW, including new scalar (2-byte float) and binary quantization variants in 0.7.0.

Index settings: Builds were fixed at IVFFlat lists=1000; HNSW m=16 and ef_construction=256. Search used IVFFlat probes ∈ {1…100} and HNSW ef_search ∈ {10…800}. Index build timing excluded data load to focus on fit/build performance.

Final results:

On dbpedia-openai-1000k-angular at 99% recall, pgvector 0.7.0 with HNSW + binary quantization cut build time by ~150× versus the first HNSW release (0.5.0).
Scalar quantization (half-precision floats in the index) delivered ~50× performance.
Throughput and p99 latency also improved by ~30× over IVFFlat at the same recall, with serial query execution.
On r7i.16xlarge instances (with x86-64 SIMD dispatch in 0.7.0), results mirrored r7gd: 100×+ build speedups with binary quantization and ~30× gains in QPS and p99 versus IVFFlat at 99% recall.

Other datasets:

sift-128-euclidean (99% recall): Consistent speedups; ~50× faster builds with scalar quantization and higher QPS/p99 gains through HNSW.
gist-960-euclidean: Best gains appeared at 90% recall; parallel HNSW builds were much faster, and scalar quantization shrank index size ~3×. Binary quantization could not meet the recall target.
glove-25-angular (99% recall): IVFFlat built faster and smaller, but HNSW delivered much higher QPS and lower p99.
glove-100-angular (95% recall): Clear HNSW build-time improvements and moderate QPS/p99 gains; higher recall would likely need larger HNSW m.

Learn more in our detailed guide to vector database open source

Best practices for improving pgvector performance

Here are some important practices to help ensure high performance when using pgvector.

1. Pick the right index (and stay current)

Selecting between HNSW and IVFFlat should be based on your workload characteristics. HNSW (Hierarchical Navigable Small World) is well-suited for high-recall, low-latency applications and can handle larger datasets with better query-time performance. However, it requires more memory and time during indexing, especially as m and ef_construction increase. Use HNSW when accuracy and query speed are more important than index build time or memory usage.

IVFFlat (Inverted File with Flat quantization) is lighter and faster to build but has lower recall, especially for high-dimensional vectors or large datasets. It’s a reasonable choice for low-latency applications where sub-second build time or small memory footprints are more critical than top-tier recall.

Keep pgvector updated. Releases since 0.6.0 have introduced significant performance improvements: parallel HNSW index builds, scalar and binary quantization, and SIMD optimizations. These improvements can offer 10×–150× speedups without requiring query rewrites. Older versions may lack compatibility with modern hardware or efficient query plans.

2. Tune HNSW

To optimize HNSW performance, adjust m and ef_construction during index creation. m controls the number of connections per node in the graph; higher values make the graph denser, improving accuracy but increasing memory use and build time. A typical setting is m=16, but values between 8 and 32 may yield better trade-offs depending on vector dimensionality and hardware capacity.

ef_construction determines how exhaustively neighbors are selected during graph build. Higher values (e.g., 256 or 512) yield better recall but slow down indexing. Benchmark different values against your recall target and index build SLA to find the best fit.

At query time, ef_search controls how many nodes are explored. Increasing it (e.g., from 100 to 800) improves recall but linearly increases latency. For high-QPS applications, start with a moderate value (200–400) and tune based on p99 latency and QPS thresholds. Also consider tuning work_mem and parallel_workers_per_gather in PostgreSQL to leverage CPU concurrency during heavy query loads.

3. Tune IVFFlat (When You Need Lighter Indexes)

IVFFlat performance is highly sensitive to the number of lists (lists) and the number of probes (probes) used at query time. The lists parameter determines how finely the vector space is partitioned; more lists mean finer partitions, improving recall but increasing index size and build time. For small or low-dimensional datasets, 100–500 lists might be enough. For larger datasets, 1,000–10,000 lists may be needed to maintain recall.

During search, probes defines how many of these partitions are searched. Using too few probes (e.g., probes=1) often results in low recall. Too many (e.g., probes=500+) can negate the performance benefits of indexing. For balanced performance, tune probes between 10 and 100 based on latency and accuracy trade-offs. You can dynamically adjust probes at query time depending on the use case; for example, fewer probes for autocomplete-like features, and more for recommendation systems.

IVFFlat works best when the embedding distribution is uniform. If your vectors are clustered or sparse, results can degrade. Preprocessing embeddings (e.g., whitening, PCA) may improve index utility.

4. Write index-friendly queries

Vector indexes in pgvector are not used if the query planner sees operations that prevent index access. Avoid wrapping vector columns in functions or expressions inside similarity conditions. Instead, make sure the raw vector column and similarity operator appear directly in ORDER BY or WHERE clauses.

For example, prefer:

SELECT id FROM items ORDER BY embedding <-> '[0.1, 0.2, ...]' LIMIT 10;

1	SELECT id FROM items ORDER BY embedding <-> '[0.1, 0.2, ...]' LIMIT 10;

Over:

SELECT id FROM items WHERE normalize(embedding) <-> '[0.1, 0.2, ...]' < 0.5;

1	SELECT id FROM items WHERE normalize(embedding) <-> '[0.1, 0.2, ...]' < 0.5;

Because the second query disables index use.

Also, PostgreSQL’s planner may underestimate the cost of ANN searches and default to sequential scans. To force index use during tuning or benchmarking, temporarily disable sequential scans:

SET enable_seqscan = off;

1	SET enable_seqscan = off;

Combine vector search with metadata filters carefully. pgvector cannot push down filters into the vector index scan, so post-filtering may hurt performance. For best results, use materialized views or pre-filter candidate rows via subqueries before applying vector similarity ranking.

5. Choose the right distance and embedding prep

Choosing the appropriate distance metric is critical. Use cosine distance (<=>) when embeddings are unit-normalized and represent directional similarity (common in transformer models like BERT). Use inner product (<#>) when magnitude carries semantic weight, as with some OpenAI or Cohere embeddings. Use L2 (<->) for traditional Euclidean-based models or image embeddings.

Preprocessing embeddings is often necessary for optimal performance. Normalize vectors before insertion if you intend to use cosine similarity. Apply dimensionality reduction techniques like PCA if vectors are too large; pgvector’s 8KB page limit restricts practical vector length to around 2,000 dimensions (with no metadata). Reducing vectors to 128–512 dimensions can improve both performance and recall.

Quantization is another lever. Scalar quantization converts floats to 2-byte half-precision, reducing index size and build time with minimal recall loss. Binary quantization offers even smaller indexes but can impact accuracy. Evaluate the trade-offs experimentally on your dataset before committing.

AI potential: Using Instaclustr for PostgreSQL and pgvector

In the dynamic landscape of artificial intelligence and machine learning, the ability to work with high-dimensional vector data is a game-changer. This is where the power of pgvector shines. When combined with the robust, enterprise-ready platform of Instaclustr for PostgreSQL, pgvector transforms your trusted relational database into a powerful engine for advanced AI applications. We make it simple to unlock sophisticated capabilities like vector similarity search directly within your existing PostgreSQL environment.

The pgvector extension is expertly designed to store and query vector embeddings—numerical representations of data like text, images, or audio. This functionality is the bedrock of modern AI, powering everything from semantic search and recommendation systems to facial recognition and anomaly detection. Instead of relying on separate, specialized vector databases, you can now perform these complex operations inside PostgreSQL. This integration streamlines your architecture, reduces operational complexity, and allows you to leverage your team’s existing PostgreSQL expertise. With pgvector, you can find the “closest” or most similar items in your dataset with incredible speed and efficiency using exact and approximate nearest neighbor searches.

Leveraging pgvector becomes even more powerful with Instaclustr’s managed PostgreSQL service. We handle the complexities of database management so you can focus on building innovative applications. Our platform is built for unwavering reliability and seamless scalability, ensuring that as your AI workloads grow, your database performance keeps pace without interruption. We provide expert, 24/7 support and proactive monitoring to guarantee your PostgreSQL instances, supercharged with pgvector, are always optimized for peak performance. With Instaclustr, you get a secure, scalable, and fully managed solution that empowers you to confidently build the next generation of AI-driven features.

For more information: