AI Cluster Health Security Overview

Feature Overview

AI Cluster Health is an optional feature in the NetApp Instaclustr console that provides a concise, AI-generated summary of recent cluster health. It translates complex monitoring metrics into a clear health synopsis with a traffic-light status (green/yellow/red) and explanatory key indicators, making cluster monitoring accessible without requiring deep technology expertise.

The feature is available for all supported Instaclustr managed technologies (including Apache Kafka, Apache Cassandra, OpenSearch, and PostgreSQL).

It complements and does not replace existing Instaclustr Technical Operations proactive monitoring and alerting.

For more information:

Why NIST AI RMF?

Customers regularly ask how we identify and manage AI-related risks. Rather than inventing our own framework, we use the NIST AI Risk Management Framework (AI RMF 1.0) — a widely recognised, voluntary framework released in January 2023 — to organise the questions we ask about each AI feature.

The framework has four core functions. Each one maps to a category of questions that customers consistently ask in security questionnaires:

Function What it covers What this document answers
GOVERN Policies, accountability, third-party governance
  • Who is the AI provider?
  • What are the contractual terms?
  • Who owns inputs/outputs?
  • What consent mechanisms exist?
  • Who owns the feature operationally?
MAP Context, purpose, data scope, intended use
  • What does the feature do?
  • What data is transmitted and what isn’t?
  • Where does data go?
  • How long is it retained?
  • Who are the users?
MEASURE Accuracy, validation, transparency, robustness
  • How was the AI output validated?
  • What happens when the AI is wrong?
  • Is the output explainable?
  • What transparency exists?
MANAGE Operational controls, incident response, monitoring
  • Can the feature be disabled?
  • What happens if the AI service is unavailable?
  • What logging and monitoring exists?
  • How are incidents handled?

The rest of this document answers these questions for AI Cluster Health.

Who is the AI provider?

The AI model used is Anthropic Claude Haiku 4.5, hosted on Amazon Bedrock (AWS managed service).

The feature itself — the integration, prompt design, tool surface, and console experience — is built and maintained in-house by NetApp Instaclustr.

What are the contractual terms?

The feature operates under the customer’s existing agreement with NetApp, which includes:

NetApp Instaclustr uses the AWS Bedrock service, specifically the Anthropic serverless models.

Under the terms of these models, Anthropic states that they may not train models on customer content, and the use of customer data is limited to the execution of the requested action.

Further details can be found at the Anthropic section of the AWS terms which govern how the service is provided to NetApp.

Per the AWS Bedrock data protection documentation:

Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties.

Model providers (including Anthropic) do not have access to the AWS-owned Model Deployment Accounts where models run, and therefore have no access to customer prompts or completions.

Who owns inputs/outputs?

The data sent for analysis consists of operational infrastructure metrics, not customer application data, stored content, or personal information.

No third party acquires any rights over the data sent to or generated by the feature. Under the AWS Bedrock terms, the AI provider does not retain, use, or claim ownership of inputs or outputs.

What does the feature do?

AI Cluster Health helps customers understand the recent state of their cluster at a glance. When a user triggers the analysis, the feature:

  • Retrieves operational monitoring metrics for the cluster over the last ~3 hours
  • Sends those metrics to an AI model (Claude Haiku 4.5 via AWS Bedrock) for analysis
  • Displays a structured health summary with:
    • A traffic-light health score (green/yellow/red)
    • An overall summary paragraph
    • Classified key points (ok/warning/critical)
    • The full list of metrics the AI examined

The feature is read-only and evaluative.

It does not prescribe or recommend actions, does not execute code or make changes to cluster state, and does not accept freeform prompts, analysis is triggered via a standardised UI action only.

The AI model can only retrieve metrics for the specific cluster being analysed; no write or mutation capabilities are exposed.

The output is clearly labelled: “The content of this page was generated by an AI and has not been reviewed by a human expert.

What data is transmitted?

When a user triggers the analysis, the following data is transmitted to the AI model via Amazon Bedrock:

  • Per-node operational monitoring metrics from the Instaclustr Monitoring API, covering approximately the last 3 hours
  • Metric names and plain-English descriptions of each metric
  • Operational identifiers including node identifiers, IP addresses, and cluster IDs

The full list of metrics examined is displayed to the user in the console after each analysis. The data transmitted consists exclusively of operational infrastructure metrics. It contains no application-level data.

The specific metrics vary by cluster technology. For a Cassandra cluster, the following metrics are analysed:

Metric Description
n::cpuUtilization::percentage CPU utilisation percentage
n::diskUtilization::percentage Disk space utilisation percentage
n::heapmemoryused::value Amount of used heap memory
n::reads::total_count_per_second Reads per second
n::writes::total_count_per_second Writes per second
n::clientRequestReadV2::latency_per_operation Average latency per client read request
n::clientRequestReadV2::99thPercentile 99th percentile read latency
n::clientRequestWrite::latency_per_operation Average latency per client write request
n::clientRequestWrite::99thPercentile 99th percentile write latency
n::compactions::pendingtasks Number of pending compaction tasks
n::nodeStatus::state Node status as seen by the other nodes in the cluster
n::pausedConnections::value Requests paused due to node overload
n::requestDiscarded::count Requests discarded due to node overload
n::droppedmessage::total_count_per_second_max Dropped messages from SEDA stages
n::hintsFailed::count_per_second_max Hints that failed delivery
n::nativetransportrequest::pending_tasks_max Pending native transport (CQL) requests
n::readstage::pending_tasks_max Pending read stage tasks
n::slalatency::sla_read SLA synthetic read latency
n::slalatency::sla_write SLA synthetic write latency
n::load::value On-disk data size per node
n::osload::last_one_minute OS load average (1 minute)

What data is not transmitted?

The feature only transmits operational infrastructure metrics. It has no access to the data stored in the customer’s cluster.

Specifically, the following are never transmitted:

  • Application payloads or stored data
  • Credentials, secrets, or authentication tokens
  • Personally identifiable information (beyond operational IP addresses and node identifiers)
  • Query content, table names, keyspace names, or schema information
  • Freeform user prompts

Where does data go and how long is it retained?

All Bedrock API calls are processed in AWS us-east-1 (Virginia, USA), regardless of the customer’s cluster region. Customers with data residency requirements should evaluate this before enabling the feature.

Aspect Detail
Data destination Amazon Bedrock, AWS us-east-1
Data retention (Bedrock) None — Bedrock does not store or log prompts and completions
Data retention (Instaclustr) In accordance with Instaclustr’s standard log retention practices
Retention configurability Not customer-configurable
Data in transit Encrypted, consistent with Instaclustr’s standard security controls

What consent mechanisms exist?

AI Cluster Health is disabled by default. Before any data is transmitted, users are presented with a consent dialog that identifies the specific cluster, the time window (past 3 hours), the destination (AWS Bedrock), and links to the governing third-party model terms.

The user must explicitly select “Agree” before the analysis proceeds. If the user selects “Disagree”, no data is transmitted.

Consent is per-use — each invocation of the feature requires fresh, explicit consent. There is no persistent opt-in, no automatic or scheduled analysis, and no organisation-level toggle. The feature is controlled entirely at the individual user level.

Who are the users?

The feature is available to authorised console users across all supported Instaclustr managed technologies (Kafka, Cassandra, OpenSearch, PostgreSQL, etc.). Each use requires the user to provide explicit per-use consent before any data is transmitted.

Customers are responsible for managing user access to the console in accordance with the Instaclustr Service Specific Terms.

How was the AI output validated?

AI Cluster Health outputs were validated in collaboration with Instaclustr’s senior operations engineers:

  • Reference reports were produced for under-provisioned, over-provisioned, and healthy cluster states
  • AI outputs were iteratively tested against these references until they aligned with expert expectations
  • The system uses expert-informed prompts rather than hardcoded global thresholds to accommodate workload diversity

Output is standardised via a structured schema with a controlled classification vocabulary (ok/warning/critical).

Validation was performed prior to release. If you have concerns about an AI-generated summary, contact Instaclustr Support.

What happens when the AI is wrong?

If the AI model returns a malformed or non-conformant response, the feature displays an error message and no summary is presented. The feature degrades gracefully rather than showing potentially unreliable output.

If a customer believes an AI-generated summary is inaccurate or misleading, the standard Instaclustr customer support process applies — contact Instaclustr Support.

What transparency exists?

The console displays the full list of metrics the AI examined alongside the summary, allowing customers to cross-reference the AI output with the raw monitoring data.

Can the feature be disabled?

The feature is not enabled by default — it only runs when a user explicitly consents and triggers an analysis. There is nothing to disable; users simply choose not to use it.

There is no organisation-level toggle to prevent users from accessing the feature at this time.

What happens if the AI service is unavailable?

The core Instaclustr platform functions fully without AI Cluster Health.

If the AI service is unavailable, the feature displays an error message. All other platform functionality — including monitoring, alerting, and cluster management — continues unaffected.

What logging and monitoring exists?

AI-generated summaries are logged for operational troubleshooting and customer support purposes, in accordance with Instaclustr’s standard logging, access control, and retention practices.

API error rates for the AI integration are monitored. Bedrock service disruptions would be detected through error rate monitoring, and the feature would degrade gracefully.

How are incidents handled?

If a customer reports an issue with an AI-generated summary, the standard Instaclustr customer support process applies — contact Instaclustr Support.

Who owns the feature operationally?

The feature is operated by NetApp Instaclustr. Customer escalations are handled through standard support channels — contact Instaclustr Support.

 

For further questions about Instaclustr’s use of AI, contact [email protected].