Cassandra Metrics

Menu

Non-table metrics follow the format n::{metricName}.

Each metric type will contain the latest available measurement.

  • n::reads – Reads per second by Cassandra.
    • Expected range: 5 ms → 200 ms
    • Impacting factors: Hardware capacity and configuration, client request load, compaction strategy, overall cluster health
    • Troubleshooting: Focus on a problematic area – e.g. unusual load, cluster operations, high compaction, GC activity. 
  • n::writes – Writes per second by Cassandra.
    • Expected range: 5 ms → 200 ms
    • Impacting factors: Hardware capacity and configuration, client request load, compaction strategy, overall cluster health
    • Troubleshooting: Focus on a problematic area – e.g. unusual load, cluster  operations, high compaction, GC activity.
  • n::cassandraReads: Reads per second by Cassandra. (Deprecated, please use n::reads)
  • n::cassandraWrites: Writes per second by Cassandra. (Deprecated, please use n::writes)
  • n::compactions: Number of pending compactions.
    • Expected range: 10 → 100 (depends on the node size)
    • Impacting factors: Cluster migrations or a high-volume data write.
    • Troubleshooting: Throttle compaction throughput using nodetool set-compaction throughput 0.
  • n::repairs (deprecated): Number of active and pending repair tasks.
  • n::clientRequestRead:  Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client). Available sub-types:
    • 95thPercentile – 95th percentile distribution of clientRequestRead
    • 99thPercentile – 99th percentile distribution of clientRequestRead
    • Expected range: 5 ms → 200 ms 

      Impacting factors: Hardware capacity and configuration, client request load, compaction strategy, overall cluster health

    • Troubleshooting: Focus on a problematic area – e.g. unusual load, cluster operations, high compaction, GC activity.
  • n::clientRequestWrite: Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client). Available sub-types:
    • 95thPercentile – 95th percentile distribution of clientRequestWrite
    • 99thPercentile – 99th percentile distribution of clientRequestWrite
    • Expected range: 5 ms → 200 ms 
    • Impacting factors: Hardware capacity and configuration, client request load, compaction strategy, overall cluster health 
    • Troubleshooting: Focus on a problematic area – e.g. unusual load, cluster operations, high compaction, GC activity. 
  • n::rangeSlices – Range Slice reads by Cassandra
  • n::casReads – Compare and Set reads by Cassandra
  • n::casWrites – Compare and Set writes by Cassandra
  • n::clientRequestRangeSlice – Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client). Available sub-types:
    •  latency_per_operation – Latency per clientRequestRangeSlices read
    • 95thPercentile – 95th percentile distribution of clientRequestRangeSlices
    • 99thPercentile – 99th percentile distribution of clientRequestRangeSlices
    • Expected range: 10ms → 300ms
    • Troubleshooting: Evaluate range queries usage
    • Impacting factors: Data modelling, overall cluster health, configuration. 
  • n::clientRequestCasRead – Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client). Available sub-types:
    • 95thPercentile – 95th percentile distribution of clientRequestCasRead
    • 99thPercentile – 99th percentile distribution of clientRequestCasRead
    • Expected range: 10ms → 300ms 
    • Impacting factors: CAS query, data modelling, overall cluster health, configuration 
  • n::clientRequestCasWrite – Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client). Available sub-types:
    • 95thPercentile – 95th percentile distribution of clientRequestCasWrite
    • 99thPercentile – 99th percentile distribution of clientRequestCasWrite
    • Expected range: 10ms – 300ms 
    • Impacting factors: CAS query, data modelling, overall cluster health, configuration. 
  • n::slalatency – Monitors our SLA latency and alerts when it is above a threshold level. Available sub-types:
    • sla_read – This is the synthetic read queries against an Instaclustr canary table.
    • sla_write – This is the synthetic write queries against an Instaclustr canary table.
  • n::elassandra – Monitoring metric for Elassandra cluster (only available if you have an Elassandra cluster). Available sub-types:
    • document_count – The total number of documents for the node.
    • query_per_second – Number of queries on a node calculated in the last 20 seconds.
    • index_per_second – Number of writes to the indexes calculated in the last 20 seconds.
  • n::readstage – The Read Stage metric represents Cassandra conducting reads from the local disk or cache.
  • n::mutationstage – The View Mutation Stage metric is responsible for materialised view writes.
  • n::nativetransportrequest – The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.
  • n::rpcthreadThe number of maximum concurrent requests from clients.
  • n::countermutationstage – Responsible for materialized view writes.
  • n::droppedmessage – The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.
    • Expected range: 0
    • Impacting factors: Load on the cluster or a particular node, configuration settings, data model
    • Troubleshooting: Identify root cause from ‘Impacting factors’ above. Possible solutions:
      •  Increase hardware capacity for a node or number of nodes in the cluster. ◦ Tune buffers, caches.
      •  Revisit data model if the issue originates from the data model. 

Note: All deprecated metrics and endpoints will be removed in the future.

Table Metrics

Table metric names follow the format cf::{keyspace}::{table}::{metricType}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric. For example,

will return the various distributions of the read latency metric.

will only return the 50th percentile distribution of the read latency metric.

Each metric type will contain the latest available measurement.

  • cf::{keyspace}::{table}::readLatencyDistribution: Measurement of local read latency for the table, on the individual node. Available sub-types:
    • 50thPercentile: 50th percentile distribution of read latency
    • 75thPercentile: 75th percentile distribution of read latency
    • 95thPercentile: 95th percentile distribution of read latency
    • 99thPercentile: 99th percentile distribution of read latency
  • cf::{keyspace}::{table}::reads: General measurements of local read latency for the table, on the individual node. Available sub-types:
    • latency_per_operation: Average local read latency per second
    • count_per_second: Reads of the table performed on the individual node
  • cf::{keyspace}::{table}::writeLatencyDistribution: Metrics for local write latency for the table, on the individual node. Available sub-types:
    • 50thPercentile: 50th percentile distribution of write latency
    • 75thPercentile: 75th percentile distribution of write latency
    • 95thPercentile: 95th percentile distribution of write latency
    • 99thPercentile: 99th percentile distribution of write latency
  • cf::{keyspace}::{table}::writes: General measurements of local write latency for the table, on the individual node. Available sub-types:
    • latency_per_operation: Average local write latency per second
    • count_per_second: Writes to the table performed on the individual node
  • cf::{keyspace}::{table}::sstablesPerRead: SSTables accessed per read of the table on the individual node. Available sub-types:
    • average: Average SSTables accessed per read
    • max: Maximum SSTables accessed per read
    • Expected range: Less than 10.
    • Impacting factors: Data model, compaction strategy, write volume, repair operation.
    • Troubleshooting: Configure optimal compaction strategy for the table and use compaction- specific tools. Repair the cluster regularly. Revisit data model in case this is a frequent issue and other solutions do not rectify. 
  • cf::{keyspace}::{table}::tombstonesPerRead: Tombstoned cells accessed per read of the table on the individual node. Available sub-types:
    • average: Average tombstones accessed per read
    • max: Maximum tombstones accessed per read
  • cf::{keyspace}::{table}::liveCellsPerRead: Live cells accessed per read of the table on the individual node. Available sub-types:
    • average: Average live cells accessed per read
    • max: Maximum live cells accessed per read
  • cf::{keyspace}::{table}::partitionSize: The size of partitions in the specified table in kb:
    • average: Average partition size
    • max: Maximum partition size
    • Expected range: 1KB → 10MB ideal range (100MB at maximum) 
    • Impacting factors: Data model, query pattern.
    • Troubleshooting: Revisit query pattern for amount of data included in a single partition for the table. If no quick fix is applicable, revisit data model. 
  • cf::{keyspace}::{table}::diskUsed: Live and total disk used by the table. Available sub-types:
    • livediskspaceused: Disk used by live cells
    • totaldiskspaceused: Disk used by both live cells and tombstones

Listing Monitored Tables

A list of monitored tables, grouped by keyspace, can be generated by making a GET request to:

The API will respond with the following packet:

Example: Response packet listing monitored tables

Clusters

Requesting ‘cluster’ metrics returns the requested measurements for each provisioned node in the cluster and follows the same format as the ‘nodes’ endpoint. All node metrics are available for use.

For example, this request:

would return the following response packet:

FREE TRIAL

Spin up a cluster in less
than 5 minutes.
(No credit card required)

Sign Up Now
Close

Site by Swell Design Group