Overview: Beyond semantic search
In part 1, we explored how knn_vector field type enables semantic search by retrieving results based on meaning rather than exact matches. When building real-world AI applications such as e-commerce search, enterprise search, and conversational systems, you will quickly run into limitations that pure dense semantic search (knn_vector field type) cannot fully address.
Here are some common challenges:
- Poor keyword precision: Dense semantic search understands meaning well, but it can sometimes miss results that contain the exact keyword a user searched for, leading to irrelevant or incomplete results
- Limited explainability: Dense vector scores are opaque, making it difficult to understand or justify why a particular result was ranked highly
- Latency and cost at scale: Dense vector search can become computationally expensive as your dataset grows, which can impact performance in production environments
This is where OpenSearch’s sparse_vector field type becomes essential. It introduces term-level importance into the search process using neural sparse approximate nearest neighbor (ANN) search, efficiently finding the closest matching documents without scanning every document, to help provide search results that are more precise, scalable, and ready for production.
Getting started with sparse_vector on Instaclustr
What is sparse_vector?
sparse_vector is OpenSearch’s field type for storing sparse neural embeddings, where each document is represented as a map of token-weight pairs. Each token corresponds to a vocabulary term, and its weight is a positive float value indicating how important that term is to the document’s meaning, making search results more interpretable compared to dense vectors.
While knn_vector provides broad semantic recall by finding results based on meaning, sparse_vector adds a precision layer by preserving the importance of specific terms in a query. Each field type addresses a different dimension of search quality, making them both valuable components in building more accurate, production-ready AI search applications.
To get the most out of sparse_vector, it helps to understand the key parameters available when defining it in an index mapping. The following are just a few of the available parameters:
- Method name (Spilled Clustering of Inverted Lists with Summaries for Maximum Inner Product Search—SEISMIC): The only supported algorithm for
sparse_vector, designed to efficiently find the most relevant documents without comparing the query against every document in the index. n_postings: Controls the maximum number of documents retained per token in the index, balancing index size and how many relevant results are returned. Defaults to 0.0005 ×doc_countof the segment, where a segment is a self-contained chunk of the index anddoc_countis the total number of documents within it.approximate_threshold: The minimum number of documents in a segment required to activate neural sparse ANN search. Segments below this threshold use standard neural sparse search automatically. The default value is 1,000,000.
For most use cases, the default values work well as a starting point and can be tuned later based on your dataset size and performance requirements. To explore the full list of supported parameters, refer to the sparse_vector documentation.
Note: To use this field type, enable sparse search in the index settings by setting "index.sparse": true, and define the sparse_vector field in the index mapping.
Practical example
As mentioned in part 1, start by creating an OpenSearch cluster on Instaclustr. When provisioning the cluster, enable the AI Search Plugin to use the neural sparse ANN search features covered in this article.
The following example demonstrates a simple neural sparse ANN search pipeline implemented using the Dev Tools console in OpenSearch Dashboards. The index is created first because it defines how OpenSearch should store and handle the sparse vector data. Once the index is ready, documents are ingested with their token-weight pairs. Then, a neural_sparse query is run against the index, returning results ranked by relevance score. The higher the score, the better the match.
1. Creating a sparse vector index
The code below creates an instaclustr-sparse-demo index with sparse search enabled and a my_vector field configured to use the seismic algorithm. The remaining parameters (n_postings and approximate_threshold) defined in the above section fall back to their default values. n_postings is calculated based on the number of documents in each segment, and approximate_threshold is set to 1,000,000, meaning ANN search activates once a segment reaches that size. For a quick-start demo like this, the default values work well and can be tuned later as your dataset and performance requirements evolve.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
PUT instaclustr-sparse-demo { "settings": { "index": { " sparse": true } }, "mappings": { "properties": { "my_vector": { "type": "sparse_vector", "method": { "name": "seismic" } } } } } |
Expected output
|
1 2 3 4 5 |
{ "acknowledged": true, "shards_acknowledged": true, "index": "instaclustr-sparse-demo" } |
2. Index documents
The next step is to ingest the documents using the bulk API. Each document is stored as a set of token-weight pairs, where each token is a term identifier, and its weight reflects how important that term is to the document’s content.
The token-weight pairs in the code below are manually written placeholders used for demonstration. In practice, these values are produced by a sparse encoding model—a machine learning model that processes raw text and automatically generates the token-weight representation. To index your own data, pass the text through a sparse encoding model to generate the corresponding token-weight pairs. The sentence-transformers Python library includes support for SParse Lexical AnD Expansion (SPLADE) model and similar sparse encoding models, making it a practical starting point for generating sparse vectors from your own data.
|
1 2 3 4 5 6 7 |
POST instaclustr-sparse-demo/_bulk { "index": { "_id": "1" } } { "my_vector": { "10": 1.0, "20": 0.5 } } { "index": { "_id": "2" } } { "my_vector": { "20": 0.9, "30": 0.3 } } { "index": { "_id": "3" } } { "my_vector": { "30": 1.0 } } |
Expected output
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
{ "took": 1447, "errors": false, "items: [ { "index": { "_index": "instaclustr-sparse-demo", "_id": "1", "_version": 1, "results": "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 0, "_primary_term": 1, "status" 201 } }, { "index": { "_index": "instaclustr-sparse-demo", "_id": "2", "_version": 1, "results": "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 1, "_primary_term": 1, "status" 201 } }, { "index": { "_index": "instaclustr-sparse-demo", "_id": "3", "_version": 1, "results": "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 2, "_primary_term": 1, "status" 201 } }, ... |
Figure 2. Documents indexed successfully with sparse vector embeddings
3. Run a neural sparse search query
With documents in the index, run a neural_sparse query using query tokens. OpenSearch scores each document based on the token weights and returns the top matches. Also, use the same sparse encoding model for both indexing and querying.
|
1 2 3 4 5 6 7 8 9 10 |
GET instaclustr-sparse-demo/_search { "query": { "neural_sparse": { "my_vector": { "query_tokens": { "10": 1.0, "20": 0.5 } } } } } |
Expected output
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
{ "took": 665, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 1.25, "hits": [ { "_index": "instaclustr-sparse-demo", "_id": "1", "_score": 1.25, "_source": { "my_vector": { "10": 1, "20": 0.5 } } }, { "_index": "instaclustr-sparse-demo", "_id": "2", "_score": 0.44921875, "_source": { "my_vector": { "20": 0.9, "30": 0.3 } } }, ] } ... |
Figure 3. Neural sparse search results returned with similarity scores
Choosing the right field type: knn_vector vs. sparse_vector
Use knn_vector when your application needs broad semantic recall, retrieving results that are conceptually related to a query even when the exact keywords do not appear in the document. It works well for RAG pipelines, recommendation systems, and conversational AI.
Use sparse_vector when term-level precision matters. For example, e-commerce search with specific product names, enterprise search with technical terminology, or any scenario where missing an exact keyword leads to poor or incomplete results.
Conclusion
Dense and sparse search do not compete; they complement one another. While knn_vector helps your application understand broad meaning, an OpenSearch sparse_vector ensures you never lose the critical importance of specific terms. Hybrid search brings these two powerful capabilities together into one unified pipeline, designed to deliver an intelligent, precise experience.
Ready to build your solution? Create your OpenSearch cluster, experiment with the code snippets above, and execute your first neural sparse ANN search query. Once you master the OpenSearch both the field types, try combining keyword search with dense semantic search to build a truly hybrid AI search pipeline. Check the hybrid search documentation to get started.