OpenSearch 3.6 is a major release that advances machine learning, vector search, neural search, and application performance monitoring. Key highlights include transparent token tracking for AI agents, 32x default vector compression via Lucene BBQ, a rebuilt AI chat experience in Dashboards, and a new APM solution built on OpenTelemetry.

Quick Summary: OpenSearch 3.6 makes AI-powered search more cost-transparent, memory-efficient, and production-ready—with significant upgrades to ML Commons, k-NN vector search, neural search, and OpenSearch Dashboards.

What Is OpenSearch 3.6?

OpenSearch 3.6 is the latest release of the open-source search and analytics engine maintained by the OpenSearch community. This release focuses on three primary areas:

  • AI agent capabilities — token tracking, V2 Chat Agent, and semantic memory APIs
  • Vector search performance — Lucene BBQ integration delivering 32x compression
  • Observability and dashboards — a new APM solution and rebuilt AI chat interface

What’s New in OpenSearch 3.6? (Full Feature Breakdown)

ML Commons: AI Agent Improvements

ML Commons is OpenSearch’s framework for integrating large language models (LLMs) and machine learning models directly within OpenSearch clusters. OpenSearch 3.6 delivers four major ML Commons upgrades.

Agent Token Usage Tracking

What it does: OpenSearch 3.6 adds transparent token usage tracking to the ML Commons agent framework—one of the most requested features from the community.

Previously, there was no visibility into how many tokens each agent execution consumed, making cost monitoring, performance debugging, and model comparison effectively impossible.

How it works in 3.6:

  • Every LLM call during agent execution is instrumented to extract and aggregate token counts
  • Works across all agent types in both streaming and non-streaming modes
  • Gracefully handles providers that don’t return usage data

Token data is returned in two views:

View Description
Per-turn usage Records for each LLM call (turn) within a single agent execution. Multiple turns for when an agent uses reason, tools, and then generates a final response.
Per-model aggregation Totals per model ID including call count

Supported LLM providers:

  • Amazon Bedrock Converse
  • OpenAI v1
  • Gemini v1beta

Each provider’s fields are normalized into a unified TokenUsage model.

Planned future enhancements: Persistent token data storage, configurable per-agent token budgets, and extended tracking to non-agent model predict responses.

Asynchronous Encryption Handling

The EncryptorImpl class—responsible for encrypting and decrypting text using tenant-specific master keys—has been refactored for scalability.

The problem with the previous implementation:

  • Used a blocking CountDownLatch with a strict 3-second timeout
  • Caused thread contention under high concurrency
  • Triggered unnecessary failures during cluster initialization delays

The new implementation (contributed by Muneer Kolarkunnu, Senior Engineer at NetApp Instaclustr):

  • Replaces blocking behavior with a fully asynchronous, ActionListener-based approach
  • Encryption and decryption requests are queued while the master key initializes
  • Fixes a race condition where concurrent requests for the same tenant could trigger duplicate master key generation

V2 Chat Agent

OpenSearch 3.6 introduces the V2 Chat Agent, a next-generation agent type designed to simplify conversational AI workflows within ML Commons.

The V2 Chat Agent builds on the existing Conversational (ReAct) and PER agent types, providing a more streamlined interface for chat-based interactions while retaining the flexibility to integrate tools and memory.

Semantic and Hybrid Search APIs for Long-Term Memory

A new set of APIs brings semantic and hybrid search capabilities to long-term memory retrieval. Agents can now search stored memory using:

  • Vector similarity (semantic search)
  • Keyword matching (lexical search)
  • Hybrid combinations of both

This enables more accurate and context-aware memory recall during multi-turn conversations—a foundational capability for agents that reason over large conversation histories.

k-NN Vector Search: Compression, Speed, and Efficiency

What Is Lucene Better Binary Quantization (BBQ)?

Lucene Better Binary Quantization (BBQ) is a vector compression technique that encodes high-dimensional float vectors into compact binary representations using advanced quantization methods inspired by RaBitQ.

BBQ is now integrated into OpenSearch 3.6’s k-NN plugin and delivers a 32x compression ratio with significantly better recall than existing Faiss Binary Quantization.

BBQ vs. Faiss Binary Quantization — Recall Comparison:

Dataset Lucene BBQ Recall@100 Faiss BQ Recall@100
Sift-128 0.32 0.18
Cohere-768-1M 0.63 0.30

With oversampling (rescoring), BBQ achieves recall above 0.95 at an oversample factor of 3 on the Cohere-10M dataset.

How to configure BBQ in OpenSearch:

Specify "encoder": {"name": "binary"} within the HNSW method parameters on the Lucene engine in your field mapping. Lucene’s built-in rescoring mechanism handles the two-phase search automatically:

  • Fast candidate retrieval via binary quantized vectors
  • Precise scoring using original FP32 vectors

BBQ Flat Index Support

OpenSearch 3.6 also introduces BBQ in flat mode—brute-force BBQ vector search for workloads where ingest efficiency, filtered search with high selectivity, and exact recall are priorities.

Configure flat mode by specifying "method": {"name": "flat"} with "compression_level": "32x" in your mapping.

32x Compression as the Default

OpenSearch 3.6 is working toward making 32x vector compression the default for the vector engine. This initiative spans both Lucene and Faiss engines and includes:

  • BQ-based indexing
  • SIMD-optimized bulk operations
  • Backward-compatible migration paths

The goal: dramatically reduce memory footprint out of the box without requiring manual compression tuning.

Neural Search: Flexible Agentic Query Translation

Embedding Model ID in the Agentic Query Translator Processor

OpenSearch 3.6 decouples the embedding model from agent registration in agentic neural search workflows.

The previous limitation: The embedding_model_id was tightly coupled to the agent definition—requiring users to re-register agents whenever they switched embedding models.

What’s changed in 3.6: embedding_model_id can now be specified as an optional parameter directly in the agentic_query_translator search processor within the search pipeline configuration:

This enables the same agent to be reused across different use cases with different embedding models—a meaningful improvement for teams managing multiple search pipelines.

OpenSearch APM: Application Performance Monitoring

OpenSearch 3.6 introduces a comprehensive Application Performance Monitoring (APM) solution built on open-source technologies and OpenTelemetry standards.

Capability Description
RED Metrics Rate, Errors, and Duration metrics for services and operations
Service Maps Interactive topology visualization showing service dependencies
Service-Level Monitoring Detailed performance metrics at service and operation levels
SLO Tracking Service Level Objective definition and monitoring
Trace Expansion Deep-dive into distributed traces with log correlation

OpenSearch APM architecture

Component Role
OpenSearch Stores service topology, relationships, and trace data
Prometheus Handles time-series metrics (RED metrics)
Data Prepper Processes OpenTelemetry data and routes to appropriate backends
OpenSearch Dashboard Provides visualization and analysis interface

This hybrid architecture routes each data type to the system optimized for it, achieving better performance and lower costs than a single-storage approach.

OpenSearch Dashboards 3.6: AI Chat, Explore, and GenAI Observability

OpenSearch Dashboards 3.6 is one of the most feature-dense Dashboards releases to date, with three primary themes.

AI Chat: Persistent Conversations and Agentic Memory

The Dashboards chatbot interface has been rebuilt around a single-window architecture. New capabilities include:

  • Conversation history — Browse, restore, and resume past conversations from a history panel
  • Agentic memory provider — Chat sessions backed by ML Commons Agent Memory APIs, with long-term memory that persists across sessions
  • Screenshot capture — Attach a screenshot of the current dashboard page to a chat message, giving the AI visual context alongside text (auto-scaled to stay under the 8K pixel limit)

Explore Plugin: ECharts Migration and PPL Enhancements

The Explore plugin—OpenSearch Dashboards’ next-generation data exploration experience—receives major investment in 3.6:

  • ECharts migration — Histogram charts migrated from elastic-charts to ECharts; all Vega-based visualizations replaced with ECharts
  • In-context visualization editor — Create and edit visualizations directly within a dashboard without navigating away
  • PPL search result highlighting — Matching terms highlighted in PPL query results
  • Backend PPL grammar for autocomplete — Autocomplete now uses the backend PPL grammar bundle as its source of truth
  • fetch_size API for PPL — Row limits enforced at the OpenSearch level rather than truncated client-side
  • Performance improvements — Raw hits cached at module level; redundant saved object requests eliminated; large query result rendering bottlenecks addressed
  • Data table controls — Toggle for cell text wrapping in the Explore data table

GenAI Agent Traces

A new agent_traces plugin brings GenAI agent trace visualization into the observability workspace. The agent traces view includes:

  • A metrics bar
  • Discover-style data table
  • Sorting controls
  • Workspace support

This makes it practical to debug multi-step agent executions directly from the Dashboards UI

Security: CSP Strict Mode

OpenSearch Dashboards 3.6 adds support for Content Security Policy (CSP) strict enforcement mode, configurable via the csp.enable flag. The console worker was also updated to load from a URL rather than a blob to comply with strict CSP policies.

OpenSearch 3.6: Feature Summary Table

Capability Description
RED Metrics Rate, Errors, and Duration metrics for services and operations
Service Maps Interactive topology visualization showing service dependencies
Service-Level Monitoring Detailed performance metrics at service and operation levels
SLO Tracking Service Level Objective definition and monitoring
Trace Expansion Deep-dive into distributed traces with log correlation

OpenSearch APM architecture

Category Feature Impact
ML Commons Agent token usage tracking Cost monitoring and model comparison
ML Commons Async encryption handling Scalability under high concurrency
ML Commons V2 Chat Agent Simplified conversational AI workflows
ML Commons Semantic/hybrid memory APIs Context-aware multi-turn conversations
K-NN Lucene BBQ integration 32x compression, better recall than Faiss BQ
k-NN BBQ flat index support High-selectivity filtered search
k-NN 32x default compression Reduced memory footprint out of the box
Neural Search Embedding model ID in processor Reuse agents across multiple embedding models
Observability OpenSearch APM End-to-end distributed system visibility
Dashboards Persistent AI chat + agentic memory Long-term conversational context
Dashboards ECharts migration + PPL enhancements Faster, more consistent data exploration
Dashboards GenAI agent traces Debug multi-step agent executions in UI
Dashboards CSP strict mode Improved security posture

Frequently Asked Questions About OpenSearch 3.6

  • What is the biggest new feature in OpenSearch 3.6?
The most impactful features are Lucene Better? +

    Binary Quantization (BBQ) for 32x vector compression and transparent agent token usage tracking in ML Commons—both directly addressing production-scale cost and performance challenges.

  • Does OpenSearch 3.6 support token tracking for LLM agents? +

    Yes. OpenSearch 3.6 introduces zero-configuration token usage tracking for all agent types in ML Commons, supporting Amazon Bedrock Converse, OpenAI v1, and Gemini v1beta out of the box.

  • What compression ratio does OpenSearch 3.6 achieve for vector search? +

    OpenSearch 3.6 achieves a 32x compression ratio using Lucene Better Binary Quantization (BBQ), with recall above 0.95 achievable via oversampling on large datasets like Cohere-10M.

  • Is OpenSearch 3.6 available on the Instaclustr Platform? +

    Instaclustr will be rolling out access to OpenSearch 3.6 on the Instaclustr Platform soon.

  • What observability features does OpenSearch 3.6 add? +

    OpenSearch 3.6 introduces a full Application Performance Monitoring (APM) solution built on OpenTelemetry, including RED metrics, service maps, SLO tracking, and distributed trace exploration.