The pgvector open source extension for similarity search is now available as an add-on to your Instaclustr for PostgreSQL® services.
The pgvector extension provides the ability to store and search ML-generated vector type data. Using a specific index type for querying a table, pgvector allows you to find a vector’s exact or approximate nearest neighbors, or data items.
Utilizing machine learning models, all data items from a set can be mapped into a single n–dimensional vector space, irrespective of the dataset’s size. Known as data vectorizing, this process transforms data items into vectors—data structures with a magnitude and direction; pgvector is one such tool that enables AI operations on this vectorized data.
In the realm of machine learning (ML), real-world entities like text, images, video, or audio are represented as continuous numbers in a high-dimensional vector space. These numerical representations, known as vector embeddings, enable ML algorithms to discern relationships, detect patterns, and make predictions. Vector similarity, calculated generally using distance metrics, plays a crucial role in identifying relationships between these vector representations, making it easy to compute and scale similarities.
To use pgvector, create a new Instaclustr for PostgreSQL database with the extension installed, and then generate embeddings for your data (such as a product catalog) using tools like the OpenAI API client.
These embeddings are stored in Instaclustr for PostgreSQL using the pgvector extension and can be used for vector similarity searches on the product catalogue. By default, pgvector performs exact nearest neighbor search, ensuring perfect recall. However, adding an index for approximate nearest neighbor search can enhance search speed at the cost of some recall.
The pgvector extension allows you to conduct vector similarity search and use embedding techniques directly in Instaclustr for PostgreSQL. It manages high-dimensional vector data within the database efficiently for tasks like similarity search, model training, data augmentation, or machine learning. pgvector enhances the similarity search experience by improving search speed and accuracy.
Vector embeddings similarity searches have numerous industry applications, such as:
- E-commerce: Similarity searches can enhance product recommendations, improving the customer’s shopping experience and increasing the chances of additional sales.
- Recommendation systems: Across various digital platforms, vector similarity is used to suggest content that is similar to what users have previously interacted with, enhancing user engagement.
- Fraud detection: In the financial sector, vector similarity can be used for quick detection and prevention of fraud, safeguarding both the financial institution and the user.
pgvector is available on any of our PostgreSQL clusters at no additional cost.
Try pgvector yourself with a free trial today! Or if you’re interested in getting more information on how exactly to use pgvector, have a look at what else we’ve been working on.