What Is knn_vector in OpenSearch?
knn_vector is OpenSearch’s dedicated field type for storing dense vector embeddings and performing semantic similarity search. Unlike keyword-based search, which matches exact terms, knn_vector enables applications to retrieve results based on meaning—making it foundational for AI-powered applications such as chatbots, recommendation engines, and Retrieval-Augmented Generation (RAG) pipelines.
When you store content as dense numerical vectors using knn_vector, OpenSearch can find the most semantically similar results to any query, even when the exact words don’t match.
This is Part 1 of a two-part series. Part 1 covers the knn_vector field type. Part 2 covers the sparse_vector field type for neural sparse search.
OpenSearch Vector Field Types: A Quick Comparison
OpenSearch provides two dedicated vector field types for building intelligent search applications:
| Field type | Search type | Best for |
| knn_vector | Dense semantic similarity search | Meaning-based retrieval, RAG pipelines, recommendations |
| sparse_vector | Neural sparse search | Token-weighted precision, term-level relevance |
Together, these two field types form a complete foundation for production-ready AI search systems.
knn_vector Configuration Parameters
When defining a knn_vector field in an index mapping, the following parameters control search behavior:
| Parameter | Description | Common options |
| space_type | How similarity is measured | l2 (Euclidean), cosinesimil (cosine similarity) |
| engine | Underlying search library | Facebook AI Similarity Search (FAISS is the default value), Lucene |
| method (algorithm) | How the vector index is built | Hierarchical Navigable Small World (HNSW supported by both FAISS and Lucene) and Inverted File Index (IVF supported via FAISS) |
| data_type | Precision of stored vectors | float (default), byte, binary |
| dimension | Number of dimensions in the vector | Must match your embedding model’s output |
Important notes
- Non-Metric Space Library (NMSLIB) was deprecated in OpenSearch 2.16 and removed in OpenSearch 3.0
- The dimension parameter is required when using the method definition approach
- Real-world embedding models typically produce vectors of 384 to 1536 dimensions
Two Ways to Define a knn_vector Field
- Method definition — Explicitly specify engine, algorithm, and space_type
- Model ID — Inherit configuration from a pre-trained model already registered in OpenSearch
Prerequisites: Enabling k-NN Search
Before using knn_vector, you must:
- Enable either the k-NN Plugin (Similarity Search Engine) or the AI Search Plugin when provisioning your cluster (enabling the AI Search Plugin automatically enables the k-NN plugin)
- Enable k-NN search in the index settings:
"index.knn": true - Define the
knn_vectorfield in the index mapping
How to Use knn_vector on Instaclustr: Step-by-Step
The following example uses the Dev Tools console in OpenSearch Dashboards on a cluster provisioned via the NetApp Instaclustr Managed Platform. Running OpenSearch on Instaclustr reduces infrastructure management overhead, allowing teams to focus on building search experiences.
Step 1: Create a Vector Index
Create an index with k-NN enabled and define an embedding field to store vectors.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
PUT instaclustr_knn_demo { "settings": { "index.knn": true }, "mappings": { "properties": { "embedding": { "type": "knn_vector", "dimension": 4 } } } } |
Default values applied when parameters are not explicitly defined:
| Parameter | Default value |
| space_type | l2 |
| engine | faiss |
| method name | hnsw |
| data_type | float |
Expected output:
|
1 2 3 4 5 |
{ "acknowledged": true, "shards_acknowledged": true, "index": "instaclustr_knn_demo" } |
Production note: Always define parameters explicitly to match your embedding model and performance requirements. Update dimension to match your model’s output size.
Step 2: Index a Document with a Vector Embedding
Index a document by providing the text content and its corresponding vector embedding.
|
1 2 3 4 5 |
POST instaclustr_knn_demo/_doc { "text": "vector search example", "embedding": [0.12, 0.45, 0.67, 0.89] } |
Note: The embedding values in this example are manually written placeholders for demonstration. In practice, embeddings are generated by a machine learning model—such as those available via the sentence-transformers Python library—that converts text into a fixed-size list of numbers representing its meaning.
Critical: Always use the same embedding model for both indexing and querying. Using different models produces meaningless similarity scores because the vectors are not comparable.
Expected output:
|
1 2 3 4 5 6 7 |
{ "_index": "instaclustr_knn_demo", "_id": "XEdATp0BSeQDCK7X2oA6", "_version": 1, "result": "created", ... } |
Step 3: Run a Semantic Search Query
Execute a k-NN query by passing a query vector to retrieve the most semantically
similar results. A higher similarity score indicates a closer semantic match.
|
1 2 3 4 5 6 7 8 9 10 11 |
POST instaclustr_knn_demo/_search { "query": { "knn": { "embedding": { "vector": [0.10, 0.40, 0.60, 0.80], "k": 5 } } } } |
Expected output (truncated):
|
1 2 3 4 5 6 7 8 9 |
{ "took": 509, "timed_out": false, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.98434883, "hits": [...] } } |
A max_score close to 1.0 indicates a highly similar result.
Once you master the OpenSearch both the field types, try combining keyword search
with dense semantic search to build a truly hybrid AI search pipeline.
Frequently Asked Questions
What is knn_vector in OpenSearch?
knn_vector is a field type in OpenSearch used to store dense vector embeddings and
perform k-nearest neighbor (k-NN) similarity search. It enables semantic search—
finding results based on meaning rather than exact keyword matches.
What is the difference between knn_vector and sparse_vector in OpenSearch?
knn_vector stores dense embeddings for semantic similarity search, while sparse_vector stores token-weighted sparse representations for neural sparse search. They are complementary: knn_vector excels at meaning-based retrieval, while sparse_vector adds term-level precision.
What engines does OpenSearch knn_vector support?
OpenSearch knn_vector supports two engines: FAISS (Facebook AI Similarity Search,
the default) and Lucene. NMSLIB was deprecated in OpenSearch 2.16 and removed in
OpenSearch 3.0.
What dimension should I use for knn_vector?
The dimension value must match the output size of your embedding model. Common
real-world embedding models produce vectors of 384 to 1536 dimensions.
How do I enable k-NN search in OpenSearch?
Set “index.knn”: true in your index settings and define a knn_vector field in your index mapping. When provisioning on Instaclustr, enable the k-NN Plugin or the AI Search Plugin.
Can I use knn_vector for RAG pipelines?
Yes. knn_vector is a core component of RAG (Retrieval-Augmented Generation) pipelines, enabling semantic retrieval of relevant documents that are then passed to a language model for generation.
Next Steps
knn_vector provides the foundation for semantic search in OpenSearch. To continue
building:
- [Part 2 of this series] — Explore
sparse_vectorand neural sparse approximate
nearest neighbor (ANN) search for improved search efficiency and precision - Create a free OpenSearch cluster on Instaclustr and run the code snippets in
this guide to execute your first semantic search query - For production deployments, explicitly configure
space_type, engine, method,
and dimension parameters to match your embedding model