Cluster Health Check

Instaclustr’s Cluster Health page exposes a number of indicators to help you understand your cluster’s long term performance. 

To access the Cluster Health page, navigate to the Monitoring page of your cluster and click on the Health tab. 

There are three potential states for each indicator:

  • Green represents a healthy state
  • Amber represents a warning state; and
  • Red represents failed state

For Warning and Failed states, you can click on Problem Information to get specific information about which keyspaces and tables are affected, what the issue is, and how you can potentially fix the issue.

Table of Contents

Disk Usage Indicator

 

The Disk Usage indicator checks the percentage of space used on each node. If the disk usage is over 75%-80% in the last hour, it indicates that the node is filling up, and it is very likely that the node cannot provide enough work space for normal Cassandra operations. Please refers to Disk Usage for more details.

Suggested fix for non-healthy states:

  • Remove excess data from the cluster
  • Add more nodes to the cluster

Partition Size Indicator

 

Partition Size indicator checks the size of the largest partition in each table. We recommended limiting the maximum partition size to 10MB for optimal performance with 100MB as un upper limit for ongoing stability. Large partitions may significantly impact the performance of Cassandra operation. Please refer to Partition Size for more details.

Suggested fix for non-healthy states:

  • Remove the problem partition
  • Re-assess the data model as data may not be evenly distributed or is bunched into too few partitions

Replication Factor Indicator

 

The Replication Factor indicator checks the number of replicas set for each datacenter. A replication factor of at least 3 is required for Instaclustr SLAs to apply and highly recommended for data protection and high availability.

Suggested fix for non-healthy states:

  • Set the replication factor to three or larger for the problem datacenters (note: increasing replication factor requires repairs to be run after the change to ensure data is correctly distributed. Contact [email protected] for assistance with this operation.)

Replication Strategy Indicator

 

The Replication Strategy indicator checks the replication class used for each keyspace. NetworkTopologyStrategy is highly recommended to ensure data is replicated to minimise impact of likely failures in your infrastructure (e.g. replicate across AWS availability zones) and to enable additional data centers to be added to the cluster without table rebuilds.

Suggested fix for non-healthy states:

  • Change the replication class to NetworkTopologyStrategy for the problem keyspaces

Tombstones to Live Cells Indicator

 

The Tombstones to Live Cells indicator checks the average ratio of the number of tombstones and live cells per read in each table. High ratios of tombstones to live cells (greater than 5x as a starting guide) can cause substantially reduced performance in reads from a table. Please refers to Tombstones and Live Cells for more details.

Suggested fix for non-healthy states:

  • Tune the compaction strategy to more aggresively remove tombstones
  • Re-assess the data model
By Instaclustr Support
Need Support?
Experiencing difficulties on the website or console?
Already have an account?
Need help with your cluster?
Contact Support
Why sign up?
To experience the ease of creating and managing clusters via the Instaclustr Console
Spin up a cluster in minutes