Documentation

k-NN Plugin

What is it

The k-NN Plugin extends OpenSearch by adding support for a vector data type. This allows for a new search which utilises the k-Nearest Neighbors algorithm. That is, providing a vector as a search input, OpenSearch can return a number (k) of nearest vectors in an index. This adds many more use cases for performing categorisation with OpenSearch when the data can be converted to a vector.

How to provision

The k-NN can be selected as a plugin. In the console just select the checkbox in the plugin section.

Native Libraries vs Apache Lucene

The k-NN plugin has multiple options for the internal representation of the vectors. The 2.x visions support an Apache Lucene native indexing utilising Lucene 9 features. This is the recommended choice as it is more natural for OpenSearch Architecture. Alternative there are 2 native libraries which build out data structures in memory outside of the JVM. These can provide performance benefits and give more control of the internals at the cost of additional complexity in regard to memory management.

You specify the engine at indexing time. Here is an example for using Apache Lucene as the engine.

PUT /demo_index
{
   'settings': {
       'index': {
           'number_of_shards': 6,
           "number_of_replicas": 1,
           "knn": True
       }
   },
   "mappings": {
       "properties": {
           "my_vector": {
               "type": "knn_vector",
               "dimension": 64,
               "method": {
                   "name":"hnsw",
                   "engine":"lucene"
               }
           },
           "token": {
               "type" : "keyword"
           }
       }
   }
}

PUT /demo_index

{

'settings': {

'index': {

'number_of_shards': 6,

"number_of_replicas": 1,

"knn": True

}

"mappings": {

"properties": {

"my_vector": {

"type": "knn_vector",

"dimension": 64,

"method": {

"name":"hnsw",

"engine":"lucene"

}

"token": {

"type" : "keyword"

}

Pointers and Tips

Size Calculation

If choosing to use the native libraries additional thought needs to be given to the size of memory needed for your nodes. Your cluster will be configured with 50% of RAM being allocated to OpenSearch. The memory cache used by nativen libraries takes another 25% of the RAM. So memory is even more important for these clusters. There are formulas for the rough estimate for both the HNSW and IVF approaches (See doco).

HNSW

where M = The number of bidirectional links (See doco)

1.1 * (4 * dimension + 8 * M) * num_vectors.

1	1.1 * (4 * dimension + 8 * M) * num_vectors.

IVF

where nlist = Number of buckets to partition vectors into (See doco).

<span style="font-weight: 400;">1.1 * (((4 * dimension) * num_vectors) + (4 * nlist * d))</span><span style="font-weight: 400;"> bytes.</span>

1	<span style="font-weight: 400;">1.1 * (((4 * dimension) * num_vectors) + (4 * nlist * d))</span><span style="font-weight: 400;"> bytes.</span>

Stats

The stats call is your friend in giving you insights into the behaviour of your indices with k-NN, if you are using the native library engines. You can see which native libraries are active and how much memory your indexes are using. The memory usage percentage can tell you what amount of the memory cache is used per index. Combine this with shard count on the node to work out the per shard cost.

GET /_plugins/_knn/stats

1	GET /_plugins/_knn/stats

Result

{
  "_nodes" : {
    "total" : 4,
    "successful" : 4,
    "failed" : 0
  },
  "cluster_name" : "rob_os_knn_test",
  "circuit_breaker_triggered" : false,
  "model_index_status" : null,
  "nodes" : {
    "v_5wuwjBSAuJD2xOUYB8AQ" : {
      "graph_memory_usage_percentage" : 72.8006,
      "graph_query_requests" : 0,
      "graph_memory_usage" : 692440,
      "cache_capacity_reached" : false,
      "load_success_count" : 15,
      "training_memory_usage" : 0,
      "indices_in_cache" : {
        "another_demo_knn_index" : {
          "graph_memory_usage" : 230754,
          "graph_memory_usage_percentage" : 24.26063,
          "graph_count" : 5
        },
        "demo_knn" : {
          "graph_memory_usage" : 461686,
          "graph_memory_usage_percentage" : 48.53997,
          "graph_count" : 10
        }
      },
      "script_query_errors" : 0,
      "hit_count" : 0,
      "knn_query_requests" : 1,
      "total_load_time" : 3462148913,
      "miss_count" : 15,
      "training_memory_usage_percentage" : 0.0,
      "graph_index_requests" : 0,
      "faiss_initialized" : true,
      "load_exception_count" : 0,
      "training_errors" : 0,
      "eviction_count" : 0,
      "nmslib_initialized" : false,
      "script_compilations" : 0,
      "script_query_requests" : 0,
      "graph_query_errors" : 0,
      "indexing_from_model_degraded" : false,
      "graph_index_errors" : 0,
      "training_requests" : 0,
      "script_compilation_errors" : 0
    },
……
…
    }
  }
}

{

"_nodes" : {

"total" : 4,

"successful" : 4,

"failed" : 0

"cluster_name" : "rob_os_knn_test",

"circuit_breaker_triggered" : false,

"model_index_status" : null,

"nodes" : {

"v_5wuwjBSAuJD2xOUYB8AQ" : {

"graph_memory_usage_percentage" : 72.8006,

"graph_query_requests" : 0,

"graph_memory_usage" : 692440,

"cache_capacity_reached" : false,

"load_success_count" : 15,

"training_memory_usage" : 0,

"indices_in_cache" : {

"another_demo_knn_index" : {

"graph_memory_usage" : 230754,

"graph_memory_usage_percentage" : 24.26063,

"graph_count" : 5

"demo_knn" : {

"graph_memory_usage" : 461686,

"graph_memory_usage_percentage" : 48.53997,

"graph_count" : 10

}

"script_query_errors" : 0,

"hit_count" : 0,

"knn_query_requests" : 1,

"total_load_time" : 3462148913,

"miss_count" : 15,

"training_memory_usage_percentage" : 0.0,

"graph_index_requests" : 0,

"faiss_initialized" : true,

"load_exception_count" : 0,

"training_errors" : 0,

"eviction_count" : 0,

"nmslib_initialized" : false,

"script_compilations" : 0,

"script_query_requests" : 0,

"graph_query_errors" : 0,

"indexing_from_model_degraded" : false,

"graph_index_errors" : 0,

"training_requests" : 0,

"script_compilation_errors" : 0

……

…

}

Warming k-NN Indices

If using native libraries, graphs need to be built in memory for a shard before it can return search results. This can cause lag which can be prevented by warming revenant indices. It is important to warm an index before calculating how much memory it will use. It is also important to consider that inactive shards can be removed from memory under a breaker condition or idle setting for the memory cache.

GET  /_plugins/_knn/warmup/index1,index2,index3?pretty

1	GET /_plugins/_knn/warmup/index1,index2,index3?pretty

Result

{

  "_shards" : {

    "total" : 6,

    "successful" : 6,

    "failed" : 0

  }

}

{

"_shards" : {

"total" : 6,

"successful" : 6,

"failed" : 0

}

Refresh Interval

The index refresh intervals can have a big impact on both indexing and search performance of the k-NN plugin. Queries are per segments so small segments created with low refresh intervals can lead to added search latency. Disabling the refresh entirely will speed up indexing in general. So it is recommended to turn off where possible, such as during a bulk load of indices using the vectors data type.

Turn Off

PUT /demo_knn/_settings

{

  "settings": {

            "index": {

              "refresh_interval" : "-1"

            }

  }

}

PUT /demo_knn/_settings

{

"settings": {

"index": {

"refresh_interval" : "-1"

}

Result

{

  "acknowledged" : true

}

{

"acknowledged" : true

}

Turn back on for 60 second interval

PUT /demo_knn/_settings

{

  "settings": {

            "index": {

              "refresh_interval" : "60s"

            }

  }

}

PUT /demo_knn/_settings

{

"settings": {

"index": {

"refresh_interval" : "60s"

}

Result

{

  "acknowledged" : true

}

{

"acknowledged" : true

}

Replicas

Disabling replicas during a bulk load will speed up indexing time. You can then add additional replicas once the load is complete. Each replica adds to the memory cost, so be careful when adding them while using the native libraries as your indexing engine. You can use the warmup and then stats calls mentioned earlier to see if you have enough memory to add replicas.

PUT /demo_knn/_settings

{

  "settings": {

            "index": {

                "number_of_replicas": 2

            }

  }

}

PUT /demo_knn/_settings

{

"settings": {

"index": {

"number_of_replicas": 2

}

Result

<span style="font-weight: 400;">{</span>

<span style="font-weight: 400;">  "acknowledged" : true</span>

<span style="font-weight: 400;">}</span>

{

"acknowledged" : true

}