Understanding OpenSearch® knn_vector: A Developer's Guide to Semantic Search

What Is knn_vector in OpenSearch?

knn_vector is OpenSearch’s dedicated field type for storing dense vector embeddings and performing semantic similarity search. Unlike keyword-based search, which matches exact terms, knn_vector enables applications to retrieve results based on meaning—making it foundational for AI-powered applications such as chatbots, recommendation engines, and Retrieval-Augmented Generation (RAG) pipelines.

When you store content as dense numerical vectors using knn_vector, OpenSearch can find the most semantically similar results to any query, even when the exact words don’t match.

This is Part 1 of a two-part series. Part 1 covers the knn_vector field type. Part 2 covers the sparse_vector field type for neural sparse search.

OpenSearch Vector Field Types: A Quick Comparison

OpenSearch provides two dedicated vector field types for building intelligent search applications:

Field type	Search type	Best for
knn_vector	Dense semantic similarity search	Meaning-based retrieval, RAG pipelines, recommendations
sparse_vector	Neural sparse search	Token-weighted precision, term-level relevance

Together, these two field types form a complete foundation for production-ready AI search systems.

knn_vector Configuration Parameters

When defining a knn_vector field in an index mapping, the following parameters control search behavior:

Parameter	Description	Common options
space_type	How similarity is measured	l2 (Euclidean), cosinesimil (cosine similarity)
engine	Underlying search library	Facebook AI Similarity Search (FAISS is the default value), Lucene
method (algorithm)	How the vector index is built	Hierarchical Navigable Small World (HNSW supported by both FAISS and Lucene) and Inverted File Index (IVF supported via FAISS)
data_type	Precision of stored vectors	float (default), byte, binary
dimension	Number of dimensions in the vector	Must match your embedding model’s output

Important notes

Non-Metric Space Library (NMSLIB) was deprecated in OpenSearch 2.16 and removed in OpenSearch 3.0
The dimension parameter is required when using the method definition approach
Real-world embedding models typically produce vectors of 384 to 1536 dimensions

Two Ways to Define a knn_vector Field

Method definition — Explicitly specify engine, algorithm, and space_type
Model ID — Inherit configuration from a pre-trained model already registered in OpenSearch

Prerequisites: Enabling k-NN Search

Before using knn_vector, you must:

Enable either the k-NN Plugin (Similarity Search Engine) or the AI Search Plugin when provisioning your cluster (enabling the AI Search Plugin automatically enables the k-NN plugin)
Enable k-NN search in the index settings: "index.knn": true
Define the knn_vector field in the index mapping

How to Use knn_vector on Instaclustr: Step-by-Step

The following example uses the Dev Tools console in OpenSearch Dashboards on a cluster provisioned via the NetApp Instaclustr Managed Platform. Running OpenSearch on Instaclustr reduces infrastructure management overhead, allowing teams to focus on building search experiences.

Step 1: Create a Vector Index

Create an index with k-NN enabled and define an embedding field to store vectors.

PUT instaclustr_knn_demo
{
  "settings": {
    "index.knn": true
    },
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
           "dimension": 4
      }
    }
  }
}

PUT instaclustr_knn_demo

{

"settings": {

"index.knn": true

"mappings": {

"properties": {

"embedding": {

"type": "knn_vector",

"dimension": 4

}

Default values applied when parameters are not explicitly defined:

Parameter	Default value
space_type	l2
engine	faiss
method name	hnsw
data_type	float

Expected output:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "instaclustr_knn_demo"
}

{

"acknowledged": true,

"shards_acknowledged": true,

"index": "instaclustr_knn_demo"

}

Production note: Always define parameters explicitly to match your embedding model and performance requirements. Update dimension to match your model’s output size.

Step 2: Index a Document with a Vector Embedding

Index a document by providing the text content and its corresponding vector embedding.

POST instaclustr_knn_demo/_doc
{
  "text": "vector search example",
  "embedding": [0.12, 0.45, 0.67, 0.89]
}

POST instaclustr_knn_demo/_doc

{

"text": "vector search example",

"embedding": [0.12, 0.45, 0.67, 0.89]

}

Note: The embedding values in this example are manually written placeholders for demonstration. In practice, embeddings are generated by a machine learning model—such as those available via the sentence-transformers Python library—that converts text into a fixed-size list of numbers representing its meaning.

Critical: Always use the same embedding model for both indexing and querying. Using different models produces meaningless similarity scores because the vectors are not comparable.

Expected output:

{
  "_index": "instaclustr_knn_demo",
  "_id": "XEdATp0BSeQDCK7X2oA6",
  "_version": 1,
  "result": "created",
  ...
}

{

"_index": "instaclustr_knn_demo",

"_id": "XEdATp0BSeQDCK7X2oA6",

"_version": 1,

"result": "created",

...

}

Step 3: Run a Semantic Search Query

Execute a k-NN query by passing a query vector to retrieve the most semantically
similar results. A higher similarity score indicates a closer semantic match.

POST instaclustr_knn_demo/_search
{
  "query": {
    "knn": {
      "embedding": {
        "vector": [0.10, 0.40, 0.60, 0.80],
        "k": 5
      }
    }
  }
}

POST instaclustr_knn_demo/_search

{

"query": {

"knn": {

"embedding": {

"vector": [0.10, 0.40, 0.60, 0.80],

"k": 5

}

Expected output (truncated):

{
  "took": 509,
  "timed_out": false,
  "hits": {
    "total": { "value": 1, "relation": "eq" },
    "max_score": 0.98434883,
    "hits": [...]
  }
}

{

"took": 509,

"timed_out": false,

"hits": {

"total": { "value": 1, "relation": "eq" },

"max_score": 0.98434883,

"hits": [...]

}

A max_score close to 1.0 indicates a highly similar result.

Once you master the OpenSearch both the field types, try combining keyword search
with dense semantic search to build a truly hybrid AI search pipeline.

Frequently Asked Questions

What is knn_vector in OpenSearch?
knn_vector is a field type in OpenSearch used to store dense vector embeddings and
perform k-nearest neighbor (k-NN) similarity search. It enables semantic search—
finding results based on meaning rather than exact keyword matches.

What is the difference between knn_vector and sparse_vector in OpenSearch?
knn_vector stores dense embeddings for semantic similarity search, while sparse_vector stores token-weighted sparse representations for neural sparse search. They are complementary: knn_vector excels at meaning-based retrieval, while sparse_vector adds term-level precision.

What engines does OpenSearch knn_vector support?
OpenSearch knn_vector supports two engines: FAISS (Facebook AI Similarity Search,
the default) and Lucene. NMSLIB was deprecated in OpenSearch 2.16 and removed in
OpenSearch 3.0.

What dimension should I use for knn_vector?
The dimension value must match the output size of your embedding model. Common
real-world embedding models produce vectors of 384 to 1536 dimensions.

How do I enable k-NN search in OpenSearch?
Set “index.knn”: true in your index settings and define a knn_vector field in your index mapping. When provisioning on Instaclustr, enable the k-NN Plugin or the AI Search Plugin.

Can I use knn_vector for RAG pipelines?
Yes. knn_vector is a core component of RAG (Retrieval-Augmented Generation) pipelines, enabling semantic retrieval of relevant documents that are then passed to a language model for generation.

Next Steps

knn_vector provides the foundation for semantic search in OpenSearch. To continue
building:

[Part 2 of this series] — Explore sparse_vector and neural sparse approximate
nearest neighbor (ANN) search for improved search efficiency and precision
Create a free OpenSearch cluster on Instaclustr and run the code snippets in
this guide to execute your first semantic search query
For production deployments, explicitly configure space_type, engine, method,
and dimension parameters to match your embedding model

Understanding OpenSearch® knn_vector: A Developer’s Guide to Semantic Search

What Is knn_vector in OpenSearch?

OpenSearch Vector Field Types: A Quick Comparison

knn_vector Configuration Parameters

Two Ways to Define a knn_vector Field

Prerequisites: Enabling k-NN Search

How to Use knn_vector on Instaclustr: Step-by-Step

Step 1: Create a Vector Index

Step 2: Index a Document with a Vector Embedding

Step 3: Run a Semantic Search Query

Frequently Asked Questions

Next Steps

About the author

Understanding OpenSearch® knn_vector: A Developer’s Guide to Semantic Search

What Is knn_vector in OpenSearch?

OpenSearch Vector Field Types: A Quick Comparison

knn_vector Configuration Parameters

Two Ways to Define a knn_vector Field

Prerequisites: Enabling k-NN Search

How to Use knn_vector on Instaclustr: Step-by-Step

Step 1: Create a Vector Index

Step 2: Index a Document with a Vector Embedding

Step 3: Run a Semantic Search Query

Frequently Asked Questions

Next Steps

About the author

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter