What is the biggest new feature in OpenSearch 3.6? The most impactful features are Lucene Better?

Binary Quantization (BBQ) for 32x vector compression and transparent agent token usage tracking in ML Commons—both directly addressing production-scale cost and performance challenges.

Does OpenSearch 3.6 support token tracking for LLM agents?

Yes. OpenSearch 3.6 introduces zero-configuration token usage tracking for all agent types in ML Commons, supporting Amazon Bedrock Converse, OpenAI v1, and Gemini v1beta out of the box.

What compression ratio does OpenSearch 3.6 achieve for vector search?

OpenSearch 3.6 achieves a 32x compression ratio using Lucene Better Binary Quantization (BBQ), with recall above 0.95 achievable via oversampling on large datasets like Cohere-10M.

Is OpenSearch 3.6 available on the Instaclustr Platform?

Instaclustr will be rolling out access to OpenSearch 3.6 on the Instaclustr Platform soon.

What observability features does OpenSearch 3.6 add?

OpenSearch 3.6 introduces a full Application Performance Monitoring (APM) solution built on OpenTelemetry, including RED metrics, service maps, SLO tracking, and distributed trace exploration.

What is the biggest new feature in OpenSearch 3.6?

The most impactful features are Lucene Better Binary Quantization (BBQ) for 32x vector compression and transparent agent token usage tracking in ML Commons—both directly addressing production-scale cost and performance challenges.

Does OpenSearch 3.6 support token tracking for LLM agents?

Yes. OpenSearch 3.6 introduces zero-configuration token usage tracking for all agent types in ML Commons, supporting Amazon Bedrock Converse, OpenAI v1, and Gemini v1beta out of the box.

What compression ratio does OpenSearch 3.6 achieve for vector search?

OpenSearch 3.6 achieves a 32x compression ratio using Lucene Better Binary Quantization (BBQ), with recall above 0.95 achievable via oversampling on large datasets like Cohere-10M.

Is OpenSearch 3.6 available on the Instaclustr Platform?

Instaclustr will be rolling out access to OpenSearch 3.6 on the Instaclustr Platform soon.

What observability features does OpenSearch 3.6 add?

OpenSearch 3.6 introduces a full Application Performance Monitoring (APM) solution built on OpenTelemetry, including RED metrics, service maps, SLO tracking, and distributed trace exploration.

OpenSearch version 3.6 release—smart agents and fast search

OpenSearch 3.6 is a major release that advances machine learning, vector search, neural search, and application performance monitoring. Key highlights include transparent token tracking for AI agents, 32x default vector compression via Lucene BBQ, a rebuilt AI chat experience in Dashboards, and a new APM solution built on OpenTelemetry.

Quick Summary: OpenSearch 3.6 makes AI-powered search more cost-transparent, memory-efficient, and production-ready—with significant upgrades to ML Commons, k-NN vector search, neural search, and OpenSearch Dashboards.

What Is OpenSearch 3.6?

OpenSearch 3.6 is the latest release of the open-source search and analytics engine maintained by the OpenSearch community. This release focuses on three primary areas:

AI agent capabilities — token tracking, V2 Chat Agent, and semantic memory APIs
Vector search performance — Lucene BBQ integration delivering 32x compression
Observability and dashboards — a new APM solution and rebuilt AI chat interface

What’s New in OpenSearch 3.6? (Full Feature Breakdown)

ML Commons: AI Agent Improvements

ML Commons is OpenSearch’s framework for integrating large language models (LLMs) and machine learning models directly within OpenSearch clusters. OpenSearch 3.6 delivers four major ML Commons upgrades.

Agent Token Usage Tracking

What it does: OpenSearch 3.6 adds transparent token usage tracking to the ML Commons agent framework—one of the most requested features from the community.

Previously, there was no visibility into how many tokens each agent execution consumed, making cost monitoring, performance debugging, and model comparison effectively impossible.

How it works in 3.6:

Every LLM call during agent execution is instrumented to extract and aggregate token counts
Works across all agent types in both streaming and non-streaming modes
Gracefully handles providers that don’t return usage data

Token data is returned in two views:

View	Description
Per-turn usage	Records for each LLM call (turn) within a single agent execution. Multiple turns for when an agent uses reason, tools, and then generates a final response.
Per-model aggregation	Totals per model ID including call count

Supported LLM providers:

Amazon Bedrock Converse
OpenAI v1
Gemini v1beta

Each provider’s fields are normalized into a unified TokenUsage model.

Planned future enhancements: Persistent token data storage, configurable per-agent token budgets, and extended tracking to non-agent model predict responses.

Asynchronous Encryption Handling

The EncryptorImpl class—responsible for encrypting and decrypting text using tenant-specific master keys—has been refactored for scalability.

The problem with the previous implementation:

Used a blocking CountDownLatch with a strict 3-second timeout
Caused thread contention under high concurrency
Triggered unnecessary failures during cluster initialization delays

The new implementation (contributed by Muneer Kolarkunnu, Senior Engineer at NetApp Instaclustr):

Replaces blocking behavior with a fully asynchronous, ActionListener-based approach
Encryption and decryption requests are queued while the master key initializes
Fixes a race condition where concurrent requests for the same tenant could trigger duplicate master key generation

V2 Chat Agent

OpenSearch 3.6 introduces the V2 Chat Agent, a next-generation agent type designed to simplify conversational AI workflows within ML Commons.

The V2 Chat Agent builds on the existing Conversational (ReAct) and PER agent types, providing a more streamlined interface for chat-based interactions while retaining the flexibility to integrate tools and memory.

Semantic and Hybrid Search APIs for Long-Term Memory

A new set of APIs brings semantic and hybrid search capabilities to long-term memory retrieval. Agents can now search stored memory using:

Vector similarity (semantic search)
Keyword matching (lexical search)
Hybrid combinations of both

This enables more accurate and context-aware memory recall during multi-turn conversations—a foundational capability for agents that reason over large conversation histories.

k-NN Vector Search: Compression, Speed, and Efficiency

What Is Lucene Better Binary Quantization (BBQ)?

Lucene Better Binary Quantization (BBQ) is a vector compression technique that encodes high-dimensional float vectors into compact binary representations using advanced quantization methods inspired by RaBitQ.

BBQ is now integrated into OpenSearch 3.6’s k-NN plugin and delivers a 32x compression ratio with significantly better recall than existing Faiss Binary Quantization.

BBQ vs. Faiss Binary Quantization — Recall Comparison:

Dataset	Lucene BBQ Recall@100	Faiss BQ Recall@100
Sift-128	0.32	0.18
Cohere-768-1M	0.63	0.30

With oversampling (rescoring), BBQ achieves recall above 0.95 at an oversample factor of 3 on the Cohere-10M dataset.

How to configure BBQ in OpenSearch:

Specify "encoder": {"name": "binary"} within the HNSW method parameters on the Lucene engine in your field mapping. Lucene’s built-in rescoring mechanism handles the two-phase search automatically:

Fast candidate retrieval via binary quantized vectors
Precise scoring using original FP32 vectors

BBQ Flat Index Support

OpenSearch 3.6 also introduces BBQ in flat mode—brute-force BBQ vector search for workloads where ingest efficiency, filtered search with high selectivity, and exact recall are priorities.

Configure flat mode by specifying "method": {"name": "flat"} with "compression_level": "32x" in your mapping.

32x Compression as the Default

OpenSearch 3.6 is working toward making 32x vector compression the default for the vector engine. This initiative spans both Lucene and Faiss engines and includes:

BQ-based indexing
SIMD-optimized bulk operations
Backward-compatible migration paths

The goal: dramatically reduce memory footprint out of the box without requiring manual compression tuning.

Neural Search: Flexible Agentic Query Translation

Embedding Model ID in the Agentic Query Translator Processor

OpenSearch 3.6 decouples the embedding model from agent registration in agentic neural search workflows.

The previous limitation: The embedding_model_id was tightly coupled to the agent definition—requiring users to re-register agents whenever they switched embedding models.

What’s changed in 3.6: embedding_model_id can now be specified as an optional parameter directly in the agentic_query_translator search processor within the search pipeline configuration:

{
  "request_processors":[
    {
      "agentic_query_translator": {
        "agent_id":"my-agent-id",
        "embedding_model_id":"my-embedding-model-id"
      }
    }
  ]
}

{

"request_processors":[

{

"agentic_query_translator": {

"agent_id":"my-agent-id",

"embedding_model_id":"my-embedding-model-id"

}

]

}

This enables the same agent to be reused across different use cases with different embedding models—a meaningful improvement for teams managing multiple search pipelines.

OpenSearch APM: Application Performance Monitoring

OpenSearch 3.6 introduces a comprehensive Application Performance Monitoring (APM) solution built on open-source technologies and OpenTelemetry standards.

Capability	Description
RED Metrics	Rate, Errors, and Duration metrics for services and operations
Service Maps	Interactive topology visualization showing service dependencies
Service-Level Monitoring	Detailed performance metrics at service and operation levels
SLO Tracking	Service Level Objective definition and monitoring
Trace Expansion	Deep-dive into distributed traces with log correlation

OpenSearch APM architecture

Component	Role
OpenSearch	Stores service topology, relationships, and trace data
Prometheus	Handles time-series metrics (RED metrics)
Data Prepper	Processes OpenTelemetry data and routes to appropriate backends
OpenSearch Dashboard	Provides visualization and analysis interface

This hybrid architecture routes each data type to the system optimized for it, achieving better performance and lower costs than a single-storage approach.

OpenSearch Dashboards 3.6: AI Chat, Explore, and GenAI Observability

OpenSearch Dashboards 3.6 is one of the most feature-dense Dashboards releases to date, with three primary themes.

AI Chat: Persistent Conversations and Agentic Memory

The Dashboards chatbot interface has been rebuilt around a single-window architecture. New capabilities include:

Conversation history — Browse, restore, and resume past conversations from a history panel
Agentic memory provider — Chat sessions backed by ML Commons Agent Memory APIs, with long-term memory that persists across sessions
Screenshot capture — Attach a screenshot of the current dashboard page to a chat message, giving the AI visual context alongside text (auto-scaled to stay under the 8K pixel limit)

Explore Plugin: ECharts Migration and PPL Enhancements

The Explore plugin—OpenSearch Dashboards’ next-generation data exploration experience—receives major investment in 3.6:

ECharts migration — Histogram charts migrated from elastic-charts to ECharts; all Vega-based visualizations replaced with ECharts
In-context visualization editor — Create and edit visualizations directly within a dashboard without navigating away
PPL search result highlighting — Matching terms highlighted in PPL query results
Backend PPL grammar for autocomplete — Autocomplete now uses the backend PPL grammar bundle as its source of truth
fetch_size API for PPL — Row limits enforced at the OpenSearch level rather than truncated client-side
Performance improvements — Raw hits cached at module level; redundant saved object requests eliminated; large query result rendering bottlenecks addressed
Data table controls — Toggle for cell text wrapping in the Explore data table

GenAI Agent Traces

A new agent_traces plugin brings GenAI agent trace visualization into the observability workspace. The agent traces view includes:

A metrics bar
Discover-style data table
Sorting controls
Workspace support

This makes it practical to debug multi-step agent executions directly from the Dashboards UI

Security: CSP Strict Mode

OpenSearch Dashboards 3.6 adds support for Content Security Policy (CSP) strict enforcement mode, configurable via the csp.enable flag. The console worker was also updated to load from a URL rather than a blob to comply with strict CSP policies.

OpenSearch 3.6: Feature Summary Table

Capability	Description
RED Metrics	Rate, Errors, and Duration metrics for services and operations
Service Maps	Interactive topology visualization showing service dependencies
Service-Level Monitoring	Detailed performance metrics at service and operation levels
SLO Tracking	Service Level Objective definition and monitoring
Trace Expansion	Deep-dive into distributed traces with log correlation

OpenSearch APM architecture

Category	Feature	Impact
ML Commons	Agent token usage tracking	Cost monitoring and model comparison
ML Commons	Async encryption handling	Scalability under high concurrency
ML Commons	V2 Chat Agent	Simplified conversational AI workflows
ML Commons	Semantic/hybrid memory APIs	Context-aware multi-turn conversations
K-NN	Lucene BBQ integration	32x compression, better recall than Faiss BQ
k-NN	BBQ flat index support	High-selectivity filtered search
k-NN	32x default compression	Reduced memory footprint out of the box
Neural Search	Embedding model ID in processor	Reuse agents across multiple embedding models
Observability	OpenSearch APM	End-to-end distributed system visibility
Dashboards	Persistent AI chat + agentic memory	Long-term conversational context
Dashboards	ECharts migration + PPL enhancements	Faster, more consistent data exploration
Dashboards	GenAI agent traces	Debug multi-step agent executions in UI
Dashboards	CSP strict mode	Improved security posture

Frequently Asked Questions About OpenSearch 3.6

What is the biggest new feature in OpenSearch 3.6? The most impactful features are Lucene Better? +

Binary Quantization (BBQ) for 32x vector compression and transparent agent token usage tracking in ML Commons—both directly addressing production-scale cost and performance challenges.
Does OpenSearch 3.6 support token tracking for LLM agents? +

Yes. OpenSearch 3.6 introduces zero-configuration token usage tracking for all agent types in ML Commons, supporting Amazon Bedrock Converse, OpenAI v1, and Gemini v1beta out of the box.
What compression ratio does OpenSearch 3.6 achieve for vector search? +

OpenSearch 3.6 achieves a 32x compression ratio using Lucene Better Binary Quantization (BBQ), with recall above 0.95 achievable via oversampling on large datasets like Cohere-10M.
Is OpenSearch 3.6 available on the Instaclustr Platform? +

Instaclustr will be rolling out access to OpenSearch 3.6 on the Instaclustr Platform soon.
What observability features does OpenSearch 3.6 add? +

OpenSearch 3.6 introduces a full Application Performance Monitoring (APM) solution built on OpenTelemetry, including RED metrics, service maps, SLO tracking, and distributed trace exploration.

OpenSearch version 3.6 release—smart agents and fast search

What Is OpenSearch 3.6?

What’s New in OpenSearch 3.6? (Full Feature Breakdown)

ML Commons: AI Agent Improvements

k-NN Vector Search: Compression, Speed, and Efficiency

Neural Search: Flexible Agentic Query Translation

OpenSearch APM: Application Performance Monitoring

OpenSearch Dashboards 3.6: AI Chat, Explore, and GenAI Observability

OpenSearch 3.6: Feature Summary Table

Frequently Asked Questions About OpenSearch 3.6

About the author

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter