Can you run hybrid search in OpenSearch without a post-processing pipeline?

While you can technically run multiple queries, you need a post-processing pipeline to normalize and combine the disparate scoring scales (e.g., BM25 and neural sparse scores) into a single, accurately ranked result list.

Why use BM25 with neural sparse vector search?

BM25 excels at exact keyword matching for specific terminology, while neural sparse vector search captures the semantic meaning of a query. Combining them ensures you capture both exact matches and broader conceptual intent.

What machine learning plugins are required for OpenSearch hybrid search?

To natively generate vector embeddings and handle neural queries, OpenSearch requires the ML Commons plugin. This plugin allows you to register and deploy the machine learning models needed for semantic search.

What plugins do you need to run a hybrid search on NetApp Instaclustr?

You’ll need the AI Search plugin enabled in order to run any semantic or hybrid searches. This plugin includes ML Commons and some useful ML tools for OpenSearch.

Can you run hybrid search in OpenSearch without a post-processing pipeline?

While you can technically run multiple queries, you need a post-processing pipeline to normalize and combine the disparate scoring scales (e.g., BM25 and neural sparse scores) into a single, accurately ranked result list.

Why use BM25 with neural sparse vector search?

Apache Iceberg is used to add a table layer on top of files in S3. It tracks schema, metadata, partitions, and snapshots so query engines can work with the data as a table instead of a loose set of raw files.

How to configure Kafka Connect Iceberg sink?

BM25 excels at exact keyword matching for specific terminology, while neural sparse vector search captures the semantic meaning of a query. Combining them ensures you capture both exact matches and broader conceptual intent.

What machine learning plugins are required for OpenSearch hybrid search?

To natively generate vector embeddings and handle neural queries, OpenSearch requires the ML Commons plugin. This plugin allows you to register and deploy the machine learning models needed for semantic search.

What plugins do you need to run a hybrid search on NetApp Instaclustr?

You’ll need the AI Search plugin enabled in order to run any semantic or hybrid searches. This plugin includes ML Commons and some useful ML tools for OpenSearch.

How to create a hybrid search pipeline in OpenSearch®

Hybrid search in OpenSearch is a retrieval method that combines multiple search techniques, such as keyword matching and semantic vector search, into a single, unified result set.

When you search strictly with keyword matching or solely with semantic, you’re losing some of the query’s intent. This strategy can also make things difficult when dealing with a diverse set of queries and documents: for example, a knowledge corpus for a business that contains thousands of pages of technical documentation and the backend for a stack overflow-like site where employees can ask and answer questions.

The solution? Hybrid search: where multiple retrieval methods are used and their result sets combine into one. A common example is using BM25 keyword search with neural sparse vector search, combining the results of a keyword-match and semantic search into one set.

In this blog post, we’ll cover the three major pieces of this OpenSearch hybrid pipeline and how they work together to create full-context results: the ingest, the index, and the post-processor. We’ll also walk through a Python implementation of this type of search, using book data as an example.

What is the architecture of an OpenSearch hybrid search?

An OpenSearch hybrid search architecture relies on three core components:

an ingest pipeline to create vector embeddings,
an index to store the data, and
a post-processing pipeline to merge and rank the results.

Turning documents made up of text into searchable entities within OpenSearch takes a bit of work. First, you need to prepare the data. They you need to store the information in a way that makes it easy and efficient to search. Finally, you want to search the data and get back one result set. These three problems are solved by ingest pipelines working together with indexes and post-processing pipelines.

OpenSearch hybrid search pipeline diagram 1

What is an ingest pipeline in OpenSearch?

An ingest pipeline in OpenSearch is a set of pre-processing steps, such as tokenization, chunking, and creating vector embeddings, that transform raw JSON text into a searchable record.

Turning a JSON object full of text into a searchable record in OpenSearch takes some preparation. This is done by OpenSearch using an ingest pipeline. This is a piece of code you can attach to your index that does things like tokenization, chunking, and creating vector embeddings. Ingest pipelines result in documents ready to be stored in OpenSearch and searched. In this demo, I create an ingest pipeline that creates vector embeddings for the summaries of the books for semantic search.

How do you design an index for hybrid search?

To design an index for hybrid search in OpenSearch, you must configure mappings that support both lexical fields (for exact keyword matching) and vector fields (for semantic search).

Good index design can make or break any OpenSearch query, and vector search is no exception. The index sets a document standard that makes search easier and faster. It also houses vector data we can use for semantic search. In this demo, I’ll design an index that allows for easy and efficient keyword and semantic search for our book records.

What is a post-processing pipeline in OpenSearch?

A hybrid query runs multiple sub-queries, and the raw scores returned from these are often not directly comparable (different scales), requiring some kind of normalization and combination to create one ranked set. This is where the post-processing pipeline comes in; it calculates a single fused score for ranking. In this pipeline, I’ll use min-max for normalization, and weighted means for combination.

How does the OpenSearch hybrid search demo work?

This demo uses an OpenSearch cluster running the ML Commons plugin (part of the AI search plugin in NetApp Instaclustr), and connects with a Python client. The data used is from the Gutenberg Project; it’s records of public-domain books. A neural sparse ML model is used to create the vector embeddings that store and search semantic values. store and search semantic values.

If you follow along with the code, you’ll end up with:

A named index that stores keyword+semantic data about books
An ingest pipeline for that index used to create neural sparse vector embeddings
A search pipeline for hybrid keyword + neural sparse semantic search

The demo also has a cleanup script that will take down what the script creates in your cluster. All ML model registration, deployment, and inference in this demo run within the OpenSearch cluster; Instaclustr manages the platform, while model choice, configuration, and query behavior remain under user control.

A demo: How does OpenSearch hybrid search work?

The code takes several steps to complete a hybrid search:

Code starts an OpenSearch client and tests connection. HybridSearchPipelineApp.run calls build_client() from client.py, which loads .env via python-dotenv, reads OPENSEARCH_HOST, OPENSEARCH_PORT, optional TLS and OPENSEARCH_USER / OPENSEARCH_PASSWORD, and returns an OpenSearch client. The “connection test” is a real cluster round trip: client.info() prints the cluster_name and the OpenSearch version string.
Next, it checks if the sample data file exists; if not, it downloads and saves book data JSON.
In this step, we are either loading or creating the JSON object with our books information in it as shown in the chart earlier:

SampleDataCollector.load_or_fetch_books uses SAMPLE_DATA_PATH from config.py (by default hybrid_search_pipeline/sample-data.json next to the package).

If that file exists, the script loads it and skips the network. If not, it calls fetch_gutendex_books: page 1 is fetched first to learn total count and page size, then additional Gutendex pages are requested in parallel (bounded by max_pages and fetch_workers).

The merged list is written back to sample-data.json so the next run is fast and reproducible offline.
Cluster settings need to be configured for ML Commons (including URL-based model registration) and for smaller demo clusters.
HybridSearchPipelineManager.apply_cluster_settings issues cluster.put_settings with persistent ML Commons flags.

In this repo, plugins.ml_commons.allow_registering_model_via_url is set to true, which is the knob that allows models to be registered from a URL when your workflow uses that path (common when pulling model artifacts from the network rather than only from pre-bundled artifacts).The script also sets only_run_on_ml_node to false and raises native_memory_threshold so single-node or small clusters are less likely to trip ML memory circuit breakers during register, deploy, and ingest.

These are demo-oriented relaxations; production clusters should follow your platform’s ML sizing and security guidance instead of copying thresholds blindly.
ML models in ML Commons need to be in a group, so a model group is created.
ModelGroupManager.resolve_model_group_id first searches ML Commons for an existing group named MODEL_GROUP_NAME (hybrid-search-models in config.py). If none exists, it registers a new model group. That group id is what this step passes into model registration and deployment.
The neural sparse encoding model is registered to that group and deployed.
This step loads and deploys the ML model so it can be used by the ingest pipeline, as shown in the diagram earlier:

MLModelManager.register_and_deploy_sparse_encoding_model posts to /_plugins/_ml/models/_register with deploy=true, the model name and version from config.py, model_group_id, and TORCH_SCRIPT format. It polls the returned ML task until COMPLETED, reads model_id, then polls the deploy task until model_state is DEPLOYED.
The ingest pipeline is created so documents get neural sparse features on insert.
This step creates the ingest pipeline and hooks it up to the ML model we deployed:

HybridSearchPipelineManager.create_ingest_pipeline defines a pipeline with a single sparse_encoding processor: it maps the document’s text field into passage_embedding (a rank_features field in the index mapping) using the deployed model id, with prune_type / prune_ratio to cap sparsity.

This is document-side neural sparse encoding: each indexed document is enriched with sparse features suitable for neural_sparse retrieval, not a separate “call an embedding API from Python” step.
The book index is created with fields for keyword and semantic search.

create_books_index creates INDEX_NAME with default_pipeline set to the ingest pipeline id so bulk indexing automatically runs sparse encoding.Mappings include lexical text and title (with title.keyword for exact/filter use), a publisher keyword field, and passage_embedding as rank_features. _source excludes the sparse field to keep stored payloads smaller while still allowing it to participate in scoring. number_of_replicas is set to 0 for a simple single-node style demo.
The bulk index API inserts book documents from the sample data.

index_books optionally truncates to MAX_BOOKS_TO_INDEX for speed, maps each Gutendex record through book_to_source (Gutenberg id, title, publisher default, combined title plus first summary into text), then streams bulk actions via streaming_bulk with a small chunk_size, request timeout, and a short pause between chunks to reduce pressure on ML and indexing.

After bulk completes, it refreshes the index so the demo search sees fresh segments.
The search post-processor pipeline is created (normalization + combination).

create_search_pipeline registers a search pipeline with a normalization-processor: min_max normalization per sub-query score, then arithmetic_mean combination with weights [0.3, 0.7], matching the order of the two sub-queries in the hybrid query (lexical first, neural sparse second).

That is what makes incomparable raw scores usable for a single fused ranking.
A hybrid search runs and results are printed.

run_hybrid_search sends a search request with search_pipeline set to the pipeline id from step 9. The query is a hybrid wrapper around a match on text and a neural_sparse query against passage_embedding with query_text and the configured analyzer.

Returned hits exclude the sparse field from _source; the script prints score, document id, and a truncated title for the top results.
The script cleans up indexes, pipelines, and ML artifacts.
cleanup deletes the demo index, the search pipeline, and the ingest pipeline. It then undeploys and deletes the registered model by id and deletes the model group by id. Errors like missing index are handled so teardown is best effort. In a real deployment you would typically omit this step or guard it with flags, so you do not delete production indices or shared model groups.

Mapping architecture to functions

Layer	Role in hybrid sparse	In this script
Ingest	Encode/embed text → sparse features at index time	`sparse_encoding` in `create_ingest_pipeline`
Index	Lexical + `semanticrank_features`; auto-run ingest	`create_books_index` + `default_pipeline`
Query	Two retrieval signals	hybrid → match + `neural_sparse` in `run_hybrid_search`
Post-processor	Comparable scores + weighted blend	`normalization-processor` in `create_search_pipeline`

Running the scripts

First, you’ll need to configure the scripts via the .env.example file. Copy it to .env and fill it out with your cluster information (host, username, password, etc).

Then, to run the scripts, in the project root, run:

$ python .\hybrid_search_pipeline\__main__.py\

1	$ python .\hybrid_search_pipeline\__main__.py\

Conclusion

There’s a key takeaway here: Hybrid search is not only a hybrid query; it is ingest-time enrichment, index/schema wiring, and query-time normalization all working together.

You can build pipelines like this quickly on NetApp Instaclustr, and you can try out hybrid search and see explore OpenSearch on Instaclustr usinghow it runs on our free trial.

Frequently Asked Questions

Can you run hybrid search in OpenSearch without a post-processing pipeline? +

While you can technically run multiple queries, you need a post-processing pipeline to normalize and combine the disparate scoring scales (e.g., BM25 and neural sparse scores) into a single, accurately ranked result list.
Why use BM25 with neural sparse vector search? +

BM25 excels at exact keyword matching for specific terminology, while neural sparse vector search captures the semantic meaning of a query. Combining them ensures you capture both exact matches and broader conceptual intent.
What machine learning plugins are required for OpenSearch hybrid search? +

To natively generate vector embeddings and handle neural queries, OpenSearch requires the ML Commons plugin. This plugin allows you to register and deploy the machine learning models needed for semantic search.
What plugins do you need to run a hybrid search on NetApp Instaclustr? +

You’ll need the AI Search plugin enabled in order to run any semantic or hybrid searches. This plugin includes ML Commons and some useful ML tools for OpenSearch.

How to create a hybrid search pipeline in OpenSearch®

What is the architecture of an OpenSearch hybrid search?

What is an ingest pipeline in OpenSearch?

How do you design an index for hybrid search?

What is a post-processing pipeline in OpenSearch?

How does the OpenSearch hybrid search demo work?

A demo: How does OpenSearch hybrid search work?

Mapping architecture to functions

Running the scripts

Conclusion

Frequently Asked Questions

About the author

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter