We’ve been having some conversations lately about Cassandra disk usage. Specifically, how full is too full on a Cassandra cluster?
Our recommendation at Instaclustr is that you do not run more than 70% full on any node in normal operations. This is to allow for:
- bursts of disk usage through Cassandra operations such as compaction;
- unexpected growth of actual data volumes; and
- ensuring that there is sufficient free capacity for add node operations to run smoothly and provide more capacity.
As part of our Cassandra managed service we monitor disk usage on our customer’s clusters and advise when they are reaching capacity thresholds. At that point they can either remove data from the cluster or we can easily provision new nodes to add capacity to the cluster (with no downtime and a except some performance impact while the new nodes are synching to the cluster).
Of course, capacity can be used up quickly when loading data or if there are changes in an application’s behaviour. So, you can also monitor disk usage on your own cluster as explained here.