AI Cluster Health Security Overview
Feature Overview
AI Cluster Health is an optional feature in the NetApp Instaclustr console that provides a concise, AI-generated summary of recent cluster health. It translates complex monitoring metrics into a clear health synopsis with a traffic-light status (green/yellow/red) and explanatory key indicators, making cluster monitoring accessible without requiring deep technology expertise.
The feature is available for all supported Instaclustr managed technologies (including Apache Kafka, Apache Cassandra, OpenSearch, and PostgreSQL).
It complements and does not replace existing Instaclustr Technical Operations proactive monitoring and alerting.
For more information:
- Introducing AI Cluster Health: Smarter monitoring made simple
- Building the AI Cluster Health Summary: From raw metrics to clear signals
Why NIST AI RMF?
Customers regularly ask how we identify and manage AI-related risks. Rather than inventing our own framework, we use the NIST AI Risk Management Framework (AI RMF 1.0) — a widely recognised, voluntary framework released in January 2023 — to organise the questions we ask about each AI feature.
The framework has four core functions. Each one maps to a category of questions that customers consistently ask in security questionnaires:
| Function | What it covers | What this document answers |
|---|---|---|
| GOVERN | Policies, accountability, third-party governance |
|
| MAP | Context, purpose, data scope, intended use |
|
| MEASURE | Accuracy, validation, transparency, robustness |
|
| MANAGE | Operational controls, incident response, monitoring |
|
The rest of this document answers these questions for AI Cluster Health.
Who is the AI provider?
The AI model used is Anthropic Claude Haiku 4.5, hosted on Amazon Bedrock (AWS managed service).
The feature itself — the integration, prompt design, tool surface, and console experience — is built and maintained in-house by NetApp Instaclustr.
What are the contractual terms?
The feature operates under the customer’s existing agreement with NetApp, which includes:
- NetApp General Terms — governing the overall service relationship, confidentiality, and data processing
- NetApp Cloud Services Terms — governing Instaclustr as a Cloud Service
- Instaclustr Service Specific Terms — covering SLAs, data protection, and customer responsibilities
NetApp Instaclustr uses the AWS Bedrock service, specifically the Anthropic serverless models.
Under the terms of these models, Anthropic states that they may not train models on customer content, and the use of customer data is limited to the execution of the requested action.
Further details can be found at the Anthropic section of the AWS terms which govern how the service is provided to NetApp.
Per the AWS Bedrock data protection documentation:
“Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties.”
Model providers (including Anthropic) do not have access to the AWS-owned Model Deployment Accounts where models run, and therefore have no access to customer prompts or completions.
Who owns inputs/outputs?
The data sent for analysis consists of operational infrastructure metrics, not customer application data, stored content, or personal information.
No third party acquires any rights over the data sent to or generated by the feature. Under the AWS Bedrock terms, the AI provider does not retain, use, or claim ownership of inputs or outputs.
What does the feature do?
AI Cluster Health helps customers understand the recent state of their cluster at a glance. When a user triggers the analysis, the feature:
- Retrieves operational monitoring metrics for the cluster over the last ~3 hours
- Sends those metrics to an AI model (Claude Haiku 4.5 via AWS Bedrock) for analysis
- Displays a structured health summary with:
- A traffic-light health score (green/yellow/red)
- An overall summary paragraph
- Classified key points (ok/warning/critical)
- The full list of metrics the AI examined
The feature is read-only and evaluative.
It does not prescribe or recommend actions, does not execute code or make changes to cluster state, and does not accept freeform prompts, analysis is triggered via a standardised UI action only.
The AI model can only retrieve metrics for the specific cluster being analysed; no write or mutation capabilities are exposed.
The output is clearly labelled: “The content of this page was generated by an AI and has not been reviewed by a human expert.”
What data is transmitted?
When a user triggers the analysis, the following data is transmitted to the AI model via Amazon Bedrock:
- Per-node operational monitoring metrics from the Instaclustr Monitoring API, covering approximately the last 3 hours
- Metric names and plain-English descriptions of each metric
- Operational identifiers including node identifiers, IP addresses, and cluster IDs
The full list of metrics examined is displayed to the user in the console after each analysis. The data transmitted consists exclusively of operational infrastructure metrics. It contains no application-level data.
The specific metrics vary by cluster technology. For a Cassandra cluster, the following metrics are analysed:
| Metric | Description |
|---|---|
n::cpuUtilization::percentage |
CPU utilisation percentage |
n::diskUtilization::percentage |
Disk space utilisation percentage |
n::heapmemoryused::value |
Amount of used heap memory |
n::reads::total_count_per_second |
Reads per second |
n::writes::total_count_per_second |
Writes per second |
n::clientRequestReadV2::latency_per_operation |
Average latency per client read request |
n::clientRequestReadV2::99thPercentile |
99th percentile read latency |
n::clientRequestWrite::latency_per_operation |
Average latency per client write request |
n::clientRequestWrite::99thPercentile |
99th percentile write latency |
n::compactions::pendingtasks |
Number of pending compaction tasks |
n::nodeStatus::state |
Node status as seen by the other nodes in the cluster |
n::pausedConnections::value |
Requests paused due to node overload |
n::requestDiscarded::count |
Requests discarded due to node overload |
n::droppedmessage::total_count_per_second_max |
Dropped messages from SEDA stages |
n::hintsFailed::count_per_second_max |
Hints that failed delivery |
n::nativetransportrequest::pending_tasks_max |
Pending native transport (CQL) requests |
n::readstage::pending_tasks_max |
Pending read stage tasks |
n::slalatency::sla_read |
SLA synthetic read latency |
n::slalatency::sla_write |
SLA synthetic write latency |
n::load::value |
On-disk data size per node |
n::osload::last_one_minute |
OS load average (1 minute) |
What data is not transmitted?
The feature only transmits operational infrastructure metrics. It has no access to the data stored in the customer’s cluster.
Specifically, the following are never transmitted:
- Application payloads or stored data
- Credentials, secrets, or authentication tokens
- Personally identifiable information (beyond operational IP addresses and node identifiers)
- Query content, table names, keyspace names, or schema information
- Freeform user prompts
Where does data go and how long is it retained?
All Bedrock API calls are processed in AWS us-east-1 (Virginia, USA), regardless of the customer’s cluster region. Customers with data residency requirements should evaluate this before enabling the feature.
| Aspect | Detail |
|---|---|
| Data destination | Amazon Bedrock, AWS us-east-1 |
| Data retention (Bedrock) | None — Bedrock does not store or log prompts and completions |
| Data retention (Instaclustr) | In accordance with Instaclustr’s standard log retention practices |
| Retention configurability | Not customer-configurable |
| Data in transit | Encrypted, consistent with Instaclustr’s standard security controls |
What consent mechanisms exist?
AI Cluster Health is disabled by default. Before any data is transmitted, users are presented with a consent dialog that identifies the specific cluster, the time window (past 3 hours), the destination (AWS Bedrock), and links to the governing third-party model terms.
The user must explicitly select “Agree” before the analysis proceeds. If the user selects “Disagree”, no data is transmitted.
Consent is per-use — each invocation of the feature requires fresh, explicit consent. There is no persistent opt-in, no automatic or scheduled analysis, and no organisation-level toggle. The feature is controlled entirely at the individual user level.
Who are the users?
The feature is available to authorised console users across all supported Instaclustr managed technologies (Kafka, Cassandra, OpenSearch, PostgreSQL, etc.). Each use requires the user to provide explicit per-use consent before any data is transmitted.
Customers are responsible for managing user access to the console in accordance with the Instaclustr Service Specific Terms.
How was the AI output validated?
AI Cluster Health outputs were validated in collaboration with Instaclustr’s senior operations engineers:
- Reference reports were produced for under-provisioned, over-provisioned, and healthy cluster states
- AI outputs were iteratively tested against these references until they aligned with expert expectations
- The system uses expert-informed prompts rather than hardcoded global thresholds to accommodate workload diversity
Output is standardised via a structured schema with a controlled classification vocabulary (ok/warning/critical).
Validation was performed prior to release. If you have concerns about an AI-generated summary, contact Instaclustr Support.
What happens when the AI is wrong?
If the AI model returns a malformed or non-conformant response, the feature displays an error message and no summary is presented. The feature degrades gracefully rather than showing potentially unreliable output.
If a customer believes an AI-generated summary is inaccurate or misleading, the standard Instaclustr customer support process applies — contact Instaclustr Support.
What transparency exists?
The console displays the full list of metrics the AI examined alongside the summary, allowing customers to cross-reference the AI output with the raw monitoring data.
Can the feature be disabled?
The feature is not enabled by default — it only runs when a user explicitly consents and triggers an analysis. There is nothing to disable; users simply choose not to use it.
There is no organisation-level toggle to prevent users from accessing the feature at this time.
What happens if the AI service is unavailable?
The core Instaclustr platform functions fully without AI Cluster Health.
If the AI service is unavailable, the feature displays an error message. All other platform functionality — including monitoring, alerting, and cluster management — continues unaffected.
What logging and monitoring exists?
AI-generated summaries are logged for operational troubleshooting and customer support purposes, in accordance with Instaclustr’s standard logging, access control, and retention practices.
API error rates for the AI integration are monitored. Bedrock service disruptions would be detected through error rate monitoring, and the feature would degrade gracefully.
How are incidents handled?
If a customer reports an issue with an AI-generated summary, the standard Instaclustr customer support process applies — contact Instaclustr Support.
Who owns the feature operationally?
The feature is operated by NetApp Instaclustr. Customer escalations are handled through standard support channels — contact Instaclustr Support.
For further questions about Instaclustr’s use of AI, contact [email protected].