What Is knn_vector in OpenSearch?

knn_vector is OpenSearch’s dedicated field type for storing dense vector embeddings and performing semantic similarity search. Unlike keyword-based search, which matches exact terms, knn_vector enables applications to retrieve results based on meaning—making it foundational for AI-powered applications such as chatbots, recommendation engines, and Retrieval-Augmented Generation (RAG) pipelines.

When you store content as dense numerical vectors using knn_vector, OpenSearch can find the most semantically similar results to any query, even when the exact words don’t match.

This is Part 1 of a two-part series. Part 1 covers the knn_vector field type. Part 2 covers the sparse_vector field type for neural sparse search.

OpenSearch Vector Field Types: A Quick Comparison

OpenSearch provides two dedicated vector field types for building intelligent search applications:

Field type Search type Best for
knn_vector Dense semantic similarity search Meaning-based retrieval, RAG pipelines, recommendations
sparse_vector Neural sparse search Token-weighted precision, term-level relevance

Together, these two field types form a complete foundation for production-ready AI search systems.

knn_vector Configuration Parameters

When defining a knn_vector field in an index mapping, the following parameters control search behavior:

Parameter Description Common options
space_type How similarity is measured l2 (Euclidean), cosinesimil (cosine similarity)
engine Underlying search library Facebook AI Similarity Search (FAISS is the default value), Lucene
method (algorithm) How the vector index is built Hierarchical Navigable Small World (HNSW supported by both FAISS and Lucene) and Inverted File Index (IVF supported via FAISS)
data_type Precision of stored vectors float (default), byte, binary
dimension Number of dimensions in the vector Must match your embedding model’s output

Important notes

  • Non-Metric Space Library (NMSLIB) was deprecated in OpenSearch 2.16 and removed in OpenSearch 3.0
  • The dimension parameter is required when using the method definition approach
  • Real-world embedding models typically produce vectors of 384 to 1536 dimensions

Two Ways to Define a knn_vector Field

  • Method definition — Explicitly specify engine, algorithm, and space_type
  • Model ID — Inherit configuration from a pre-trained model already registered in OpenSearch

Prerequisites: Enabling k-NN Search

Before using knn_vector, you must:

  • Enable either the k-NN Plugin (Similarity Search Engine) or the AI Search Plugin when provisioning your cluster (enabling the AI Search Plugin automatically enables the k-NN plugin)
  • Enable k-NN search in the index settings: "index.knn": true
  • Define the knn_vector field in the index mapping

How to Use knn_vector on Instaclustr: Step-by-Step

The following example uses the Dev Tools console in OpenSearch Dashboards on a cluster provisioned via the NetApp Instaclustr Managed Platform. Running OpenSearch on Instaclustr reduces infrastructure management overhead, allowing teams to focus on building search experiences.

Step 1: Create a Vector Index

Create an index with k-NN enabled and define an embedding field to store vectors.

Default values applied when parameters are not explicitly defined:

Parameter Default value
space_type l2
engine faiss
method name hnsw
data_type float

Expected output:

Production note: Always define parameters explicitly to match your embedding model and performance requirements. Update dimension to match your model’s output size.

Step 2: Index a Document with a Vector Embedding

Index a document by providing the text content and its corresponding vector embedding.

Note: The embedding values in this example are manually written placeholders for demonstration. In practice, embeddings are generated by a machine learning model—such as those available via the sentence-transformers Python library—that converts text into a fixed-size list of numbers representing its meaning.

Critical: Always use the same embedding model for both indexing and querying. Using different models produces meaningless similarity scores because the vectors are not comparable.

Expected output:

Step 3: Run a Semantic Search Query

Execute a k-NN query by passing a query vector to retrieve the most semantically
similar results. A higher similarity score indicates a closer semantic match.

Expected output (truncated):

A max_score close to 1.0 indicates a highly similar result.

Once you master the OpenSearch both the field types, try combining keyword search
with dense semantic search to build a truly hybrid AI search pipeline.

Frequently Asked Questions

What is knn_vector in OpenSearch?
knn_vector is a field type in OpenSearch used to store dense vector embeddings and
perform k-nearest neighbor (k-NN) similarity search. It enables semantic search—
finding results based on meaning rather than exact keyword matches.

What is the difference between knn_vector and sparse_vector in OpenSearch?
knn_vector stores dense embeddings for semantic similarity search, while sparse_vector stores token-weighted sparse representations for neural sparse search. They are complementary: knn_vector excels at meaning-based retrieval, while sparse_vector adds term-level precision.

What engines does OpenSearch knn_vector support?
OpenSearch knn_vector supports two engines: FAISS (Facebook AI Similarity Search,
the default) and Lucene. NMSLIB was deprecated in OpenSearch 2.16 and removed in
OpenSearch 3.0.

What dimension should I use for knn_vector?
The dimension value must match the output size of your embedding model. Common
real-world embedding models produce vectors of 384 to 1536 dimensions.

How do I enable k-NN search in OpenSearch?
Set “index.knn”: true in your index settings and define a knn_vector field in your index mapping. When provisioning on Instaclustr, enable the k-NN Plugin or the AI Search Plugin.

Can I use knn_vector for RAG pipelines?
Yes. knn_vector is a core component of RAG (Retrieval-Augmented Generation) pipelines, enabling semantic retrieval of relevant documents that are then passed to a language model for generation.

Next Steps

knn_vector provides the foundation for semantic search in OpenSearch. To continue
building:

  • [Part 2 of this series] — Explore sparse_vector and neural sparse approximate
    nearest neighbor (ANN) search for improved search efficiency and precision
  • Create a free OpenSearch cluster on Instaclustr and run the code snippets in
    this guide to execute your first semantic search query
  • For production deployments, explicitly configure space_type, engine, method,
    and dimension parameters to match your embedding model