OpenSearch 3.6 is shaping up to be a landmark release, bringing meaningful advances across machine learning, vector search, neural search, and observability. From transparent token tracking in AI agents (a pivotal cost-tracking feature) to meta-advancements like 32x default vector compression, this release reflects the community’s focus on making OpenSearch the backbone for intelligent, high-performance search, and analytics workloads. Let’s walk through what’s new.

ML Commons: A more capable agent framework

ML Commons is the main driver for using LLMs and other ML models within OpenSearch clusters. The improvements made within are important for OpenSearch’s continued development as an AI search platform.

Agent token usage tracking

One of the most requested features for the ML Commons agent framework has arrived: transparent token usage tracking. Until now, there was no visibility into how many tokens each agent execution consumed—making cost monitoring, performance debugging, and model comparison effectively impossible.

Starting in 3.6, every LLM call made during agent execution is instrumented to extract and aggregate token counts. This applies across all agent types in both streaming and non-streaming modes.

The tracking is zero-configuration and best-effort: it’s always on, never blocks execution, and gracefully handles cases where a provider doesn’t return usage data. Token data is returned as part of the agent response in two views:

  • Per-turn usage—an ordered breakdown of each LLM call with turn number, model metadata, and token counts (input, output, total, cache reads, reasoning tokens, and more).
  • Per-model aggregation—totals per model ID including call count.

The feature supports multiple LLM providers out of the box, including Amazon Bedrock Converse, OpenAI v1, and Gemini v1beta, each with provider-specific fields normalized into a unified TokenUsage model.

Future work includes persistent storage of token data, configurable per-agent token budgets, and extending tracking to non-agent model predict responses.

Asynchronous encryption handling for scalability

The EncryptorImpl class, responsible for encrypting and decrypting text using tenant-specific master keys, has been refactored for scalability. The previous implementation relied on a blocking CountDownLatch with a strict 3-second timeout—a design that caused thread contention under high concurrency and unnecessary failures during cluster initialization delays.

The new implementation, contributed by Muneer Kolarkunnu, Senior Engineer of NetApp Instaclustr, replaces blocking behavior with a fully asynchronous, ActionListener-based approach. Encryption and decryption requests are now queued while the master key initializes and processed once it’s ready. This also fixes a subtle race condition where concurrent requests for the same tenant could trigger duplicate master key generation.

V2 Chat Agent

ML Commons 3.6 introduces the V2 Chat Agent (#4732), a next-generation agent type designed to simplify conversational AI workflows. The V2 Chat Agent builds on lessons learned from the existing Conversational (ReAct) and PER agent types, providing a more streamlined interface for chat-based interactions while retaining the flexibility to integrate tools and memory.

Semantic and hybrid search APIs for long-term memory

A new set of APIs brings semantic and hybrid search capabilities to long-term memory retrieval (#4658). Agents can now search their stored memory using vector similarity, keyword matching, or a combination of both—enabling more accurate and context-aware memory recall during multi-turn conversations. This is a foundational piece for building agents that can reason over large histories without losing relevance.

k-NN: Pushing the boundaries of vector search

OpenSearch 3.6 delivers a wave of improvements to the k-NN plugin, centered on better compression, faster search, and more efficient memory utilization.

Lucene Better Binary Quantization (BBQ) integration

The headline feature for k-NN is the integration of Lucene’s Better Binary Quantization (BBQ)—a vector compression technique that encodes high-dimensional float vectors into compact binary representations using advanced quantization methods inspired by RaBitQ.

BBQ delivers a 32x compression ratio while maintaining significantly better recall than existing Faiss Binary Quantization:

Dataset Lucene BBQ Recall@100 Faiss BQ Recall@100
Sift-128 0j32 0j18
Cohere-768-1M 0.63 0.30

With oversampling (rescoring), BBQ achieves recall above 0.95 at an oversample factor of 3 on the Cohere-10M dataset—making it a practical choice for large-scale production workloads where memory is constrained but accuracy matters.

Users configure BBQ through the field mapping by specifying "encoder": {"name": "binary"} within the HNSW method parameters on the Lucene engine. Lucene’s built-in rescoring mechanism handles the two-phase search automatically: fast candidate retrieval via binary quantized vectors, followed by precise scoring using original FP32 vectors.

BBQ flat index support

Beyond HNSW, 3.6 also introduces BBQ in flat mode—brute-force BBQ vector search for workloads where ingest efficiency, filtered search with high selectivity, and exact recall are priorities. Users simply specify "method": {"name": "flat"} with "compression_level": "32x" in their mapping.

32x compression as the default

OpenSearch 3.6 is working toward making 32x compression the default for the vector engine. This meta-initiative spans both Lucene and Faiss engines, incorporating BBQ-based indexing, SIMD-optimized bulk operations, and backward-compatible migration paths. The goal is to dramatically reduce memory footprint out of the box without requiring users to tune compression settings.

Neural Search: Flexible agentic query translation

Embedding Model ID in agentic query translator processor

The agentic search experience gets more flexible in 3.6. Previously, the embedding_model_id was tightly coupled to the agent definition when using agentic search with neural queries—users had to re-register agents whenever they wanted to use a different embedding model.

Now, embedding_model_id can be specified as an optional parameter directly in the agentic query translator search processor within the search pipeline configuration:

This decouples the embedding model from agent registration, enabling the same agent to be reused across different use cases with different embedding models—a meaningful UX improvement for teams managing multiple search pipelines.

Dashboards observability: Application performance monitoring

OpenSearch APM

OpenSearch 3.6 introduces a comprehensive Application Performance Monitoring (APM) solution built on open source technologies and OpenTelemetry standards. This is a major addition to OpenSearch’s observability capabilities, providing end-to-end visibility into distributed systems.

Key capabilities include:

  • RED metrics—Rate, Errors, and Duration metrics for services and operations
  • Service maps—Interactive topology visualization showing service dependencies
  • Service-level monitoring—Detailed performance metrics at service and operation levels
  • SLO tracking—Service Level Objective definition and monitoring
  • Trace exploration—Deep dive into distributed traces with log correlation

The APM solution employs a hybrid architecture:

  • OpenSearch stores service topology, relationships, and trace data
  • Prometheus handles time-series metrics (RED metrics) for efficient querying
  • Data Prepper processes OpenTelemetry data and routes it to appropriate backends
  • OpenSearch Dashboards provides the visualization and analysis interface

This separation of concerns is intentional—time-series metrics and document-based trace data have fundamentally different query patterns and storage requirements. By routing each data type to the system optimized for it, OpenSearch APM achieves better performance and lower costs than a single-storage approach.

OpenSearch Dashboards: A rebuilt AI and analytics experience

OpenSearch Dashboards 3.6 is one of the most feature-dense Dashboards releases in recent memory. The changes fall into three main themes: a dramatically improved AI chat experience, a matured Explore plugin for data exploration, and new GenAI observability tooling that brings agent trace visibility directly into the UI.

AI Chat: Persistent conversations, agentic memory, and screenshot context

The chatbot interface has been rebuilt from the ground up around a single-window architecture (#11483), consolidating all chat state into a mount service rather than scattering it across component instances. This foundational change enables several features that were previously impossible:

  • Conversation history (#11348)—users can browse, restore, and resume past conversations from a history panel. The chat window automatically loads the most recent conversation on open (#11396), and history loads incrementally when content doesn’t fill the container (#11517).
  • Agentic memory provider (#11380)—chat sessions can now be backed by ML Commons Agent Memory APIs, enabling long-term memory that persists across sessions and supports multi-data-source configurations (#11529).
  • Screenshot capture (#11287)—users can attach a screenshot of the current dashboard page to a chat message, giving the AI visual context alongside text. Screenshots are automatically scaled to stay under the 8K pixel limit and cleared from memory after sending (#11585). The implementation uses html2canvas-pro for CSP nonce compliance (#11329).

A significant number of streaming and state management bugs were also resolved: the “Thinking…” indicator no longer disappears prematurely (#11459), chat input is correctly disabled during tool confirmation (#11588) and tool result sending (#11510), and the chat window no longer gets stuck loading when restoring a conversation that ended mid-tool-call (#11576).

Explore: ECharts Migration, PPL Enhancements, and In-Context Editing

The Explore plugin—OpenSearch Dashboards’ next-generation data exploration experience—sees major investment in 3.6:

  • ECharts migration—the histogram chart has been migrated from elastic-charts to ECharts (#11341) and all remaining Vega-based visualizations have been removed in favor of ECharts (#11468). This aligns Explore’s rendering stack with the rest of the Dashboards visualization layer.
  • In-context visualization editor (#11528)—users can now create and edit visualizations directly within a dashboard without navigating away, keeping context intact. A visualizations tab is also available within the agent traces view (#11564).
  • PPL search result highlighting (#11547)—matching terms are now highlighted in PPL query results, bringing Explore’s search UX closer to Discover.
  • Backend PPL grammar for autocomplete (#11428)—the PPL autocomplete engine now uses the backend PPL grammar bundle as its source of truth, with a safe fallback. This keeps autocomplete in sync with the server’s actual parser rather than a separately maintained frontend copy.
  • fetch_size API for PPL (#11359)—row limits are now enforced at the OpenSearch level rather than truncated client-side, reducing unnecessary data transfer.
  • Performance improvements—raw hits are now cached at the module level to reduce Redux freeze overhead (#11478), redundant saved object requests on initial load are eliminated (#11413), and large query result rendering bottlenecks have been addressed (#11390).
  • Data table controls—a new toggle in the Explore data table lets users switch cell text wrapping on or off (#11321).

GenAI agent traces

A new agent_traces plugin (#11387) brings GenAI agent trace visualization into the observability workspace. This is complemented by the @osd/apm-topology package (#11394), which provides the APM service map, trace map, and GenAI agent trace graph rendering using a celestial map layout (#11450).

The agent traces view includes a metrics bar, Discover-style data table, sorting controls, and workspace support (#11513), making it practical for debugging multi-step agent executions directly from the Dashboards UI.

Security: CSP strict mode

OpenSearch Dashboards 3.6 adds support for Content Security Policy strict enforcement mode (#11536), configurable via the csp.enable flag (with csp.strict retained as a deprecated alias for backward compatibility—#11572, #11594). The console worker was also updated to load from a URL rather than a blob to comply with strict CSP policies (#11353).

Looking ahead

OpenSearch 3.6 represents a significant step forward across the stack. The ML Commons improvements—spanning the V2 Chat Agent, new pooling modes, semantic memory retrieval, and a substantial round of stability fixes—make the agent framework more capable and production-ready than ever. The k-NN enhancements—particularly BBQ and 32x default compression—aim to tackle the fundamental challenge of running vector search at scale without breaking the bank on infrastructure. OpenSearch Dashboards 3.6 brings the AI assistant experience to a new level of maturity with persistent conversations, agentic memory, screenshot context, and deep Explore performance improvements. All of this points toward a future where OpenSearch serves not just as a search and analytics engine, but as an intelligent platform for orchestrating AI-powered workflows.

We’ll aim to be rolling out access to OpenSearch 3.6 on the Instaclustr Platform soon, watch this space for more!