Monitoring API

The monitoring API currently provides the following monitoring information:

  • Long-term cluster health indicators
  • Metrics for:
    • Cassandra status
    • reads and writes operations per second
    • CPU utilization
    • disk utilization
    • pending compactions

Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre.

The API also provides key statistics for each table in the cluster (similar to what is available through “nodetool tablehistograms”):

  • read & write counts (mean, distribution)
  • read & write latency (mean, distribution)
  • live cells & tombstones per read (mean, max)
  • number of sstables read for each read operation (mean, max)

The set of available metrics will expand as we build out this API. Descriptions of each of the metrics can be found in the monitoring section of this support site:

Table of Contents


All requests to the API must use Basic Authentication and contain a valid username and the monitoring API key. API keys are created per user account and can be retrieved via the Instaclustr Console from the Account > API Key tab.

All available metrics are updated every 20 seconds (i.e. requesting the same metric twice in 20 seconds will always return the same response).


Our APIs are engineered and operated for high levels of availability. However, you should not expect that our APIs have the same level of availability as a managed cluster and you should not build any dependency on the APIs into availability of your service. While we do not provide formal SLAs for the APIs, we aim for 99.95% availability (ie up to ~20 mins downtime per month). Longer maintenance outages may be schedule with appropriate notice. Consequently, any service dependent on the APIs should be able to cope gracefully with periods of unavailability of up to a few minutes. Planned and unplanned outages are communicated via

Cluster Health Indicator

Cluster Health Indicator API provides a summary of indicators on the long-term health of your cluster and is retrieved by making a GET request to<clusterId>/indicators

The API will respond with status 200 OK and a JSON packet containing the following information:

Example: Response packet showing cluster health

The output JSON consists of:

  • type: The name of the indicator being returned. The API returns five indicator types; REPLICATION_STRATEGY and REPLICATION_FACTOR for each keyspace. DISK_USAGE for each node. PARTITION_SIZE and TOMBSTONE_LIVECELL for every table.
  • stateDetails: The state of the indicator type. stateDetails can be PASS, UNKNOWN, FAIL, WARN with further details provided in the form of a message.

A detailed description of cluster health indicators can be found in this support article:

API Parameters

Metrics are requested by constructing a GET request, consisting of the following attributes:

classEither ‘clusters’, ‘datacentres’ or ‘nodes’.

  • ‘clusters’: Returns the metrics for each node in the cluster/s.
  • ‘datacentres’: Returns the metrics for each node belonging to the specified data centre/s.
  • ‘nodes’: Returns the metrics for the specific node/s.
UUID or public IPIf the class is set to ‘clusters’ or ‘datacentres’, then the UUID of cluster or datacentre must be specified.

Alternatively, if the class is set to ‘nodes’, then either the nodes’ UUID or public IP may be specified.

metricsThe metrics to return are specified as a comma-delimited querystring parameter. Up to 20 metrics may be specified.

For a complete list of available metrics, refer to the Reference section.

Formatted as: “metrics=<metric_1>,<metric_2>,…”

periodThe period of time from which monitoring information is returned. It is also assigned a period type.

Formatted as: “period=<period>&type=<period type>”

periodperiod type
‘1m’Returns the most recent monitoring value.NA
’15m’Returns the most recent monitoring value.Returns the average of all monitoring results from 15 minutes ago to now.
‘1h’Returns the most recent monitoring value.Returns the average of all monitoring results from 1 hr ago to now.
‘3h’Returns the most recent monitoring value.Returns the average of all monitoring results from 3 hrs ago to now.
‘1d’Returns the most recent monitoring value.Returns the average of all monitoring results from 1 day ago to now.
‘7d’Returns the most recent monitoring value.Returns the average of all monitoring results from 7 days ago to now.
’30d’Returns the most recent monitoring value.Returns the average of all monitoring results from 30 days ago to now.
reportNaNEither ‘true‘ or ‘false‘.

If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting ‘reportNaN=true’ will return NaN values in the API response.

Formatted as: “reportNaN=<true or false>”

Request format:{{class}}/{{UUID or Public IP}}?{{metrics}}&{{period}}&{{reportNaN}


Scenario Relevant Request Format
Return the CPU and disk utilization for each node in the cluster with a UUID of e7342f08-d32f-41af-95be-cfaa0a433a26. 3a26?metrics=n::cpuUtilization,n::diskUtilization
Return the latest results of disk utilization for each node in the cluster with a UUID of 10e837bd-47a1-4e39-b7d4-5137e145491d.
Return the average of read and write per second by Cassandra for each node belonging to the datacentre with a UUID of 001224dc-989c-4ad0-8b37-1ce345065b8f, from 15 minutes ago to now. 5065b8f?metrics=n::cassandraReads,n::cassandraWrites&period=15m&type=aggregate 
Return the list of all read latency distribution values for the ‘tcf1’ table in the ‘tk1’ keyspace, for just the node, from 7 days ago to now, reporting NaN values as well. :readlatencydistribution&period=7d&type=range&reportNaN=true

Successfully processed metric API requests will return a 200 status code and accompanying JSON packet. JSON packets follow the same basic structure as listed in the following example:

e.g. Response with CPU Utilization for a single node

Each payload item represents an individual metric and will consist of:

metricThe name of the metric being returned
typeThe sub-type of the metric that is being measured (e.g. for the diskUsedmetric, the available ‘types’ are livediskspaceused and totaldiskspaceused)
unitThe unit of measurement.  The following unit abbreviations are used:

  • GB: Gigabyte
  • MB: Megabyte
  • B: Byte
  • s: Second
  • ms: Millisecond
  • us: Microsecond
  • 1: Non-standard unit (e.g. percentage)
  • us/1: Microseconds pre non-standard unit (e.g. latency per read operation)
  • 1/s: Non-standard unit per second (e.g. write operations per second)
valuesAn array of time/value maps containing the measurement as recorded by Instaclustr

If multiple metrics are requested, the response will include multiple payload entries:

e.g. Get CPU Utilization and Disk Utilization for a single node

Unsuccessful calls will return the following responses, depending upon the issue:

  • 400 Bad Request: Returned when the expected node or cluster ID is not a valid UUID or an incorrect metric name has been supplied.
  • 401 Unauthorized: Returned when no or incorrect username and/or API key details are provided.
  • 404 Not Found: Returned when accessing an incorrect URL or trying to access a cluster/node not owned by the authenticated user.
  • 415 Unsupported Media Type: Returned when the payload is in an unsupported format. Possibly resolved by specifying content-type as application/json.
  • 429 Too Many Requests: Returned when more than 70 requests per second are being received by your user.
  • 500 Server Error: All other errors

e.g. Error response


General Metrics

  • n::nodeStatus (deprecated): Whether Cassandra is available on the node.  Please note that this feature has been deprecated and will soon be removed from associated response bodies.   
  • n::cpuUtilization: Current CPU utilisation as a percentage of total available. Maximum value is 100%, regardless of the number of cores on the node.
  • n::osload: Current OS load. Generally, a node is overloaded if os load >= the number of cores on the node.
  • n::diskUtilization: Total disk space utilisation, by Cassandra, as a percentage of total available.
  • n::cpuguestpercent : Time spent running a virtual CPU for guest OS’ under control of kernel
  • n::cpuguestnicepercent : Niced processes executing in user mode in virtual OS
  • n::cpusystempercent : Percentage of processes executing in kernel mode
  • n::cpuidlepercent : Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.
  • n::cpuiowaitpercent : CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent
  • n::cpuirqpercent : Number of hardware interrupts the kernel is servicing
  • n::cpunicepercent : Percentage of processes executing in user mode which have a positive nice value
  • n::cpusoftirqpercent : Number of software interrupts the kernel is servicing
  • n::cpustealpercent : Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPU
  • n::cpuuserpercent : Processes executing in user mode, including application processes
  • n::memavailable : Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.
  • n::networkoutdelta : Delta count of bytes transmitted.
  • n::networkindelta : Delta count of bytes received.
  • n::networkouterrorsdelta : Delta count of transmit errors detected.
  • n::networkinerrorsdelta : Delta count of receive errors detected.
  • n::networkoutdroppeddelta : Delta count of transmit packets dropped.
  • n::networkindroppeddelta : Delta count of receive packets dropped.
  • n::tcpall : Total number of TCP connections in all state.
  • n::tcpestablished : Number of open TCP connections.
  • n::tcplistening : Number of TCP sockets waiting for a connection request from any remote TCP and port.
  • n::tcptimewait : Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
  • n::tcpclosewait : Number of TCP sockets which connection is in the process of being closed.
  • n::filedescriptorlimit : Maximum number of open files limit for the node OS.
  • n::filedescriptoropencount : Current number of open files in the node OS.

Need Support
Learn More

Already have an account?
Login to the Console

Experiencing difficulties on the website or console?
Status page for known incidents

Don’t have an account yet?
Sign up for a free trial

Why sign up?
To experience the ease of creating and managing clusters via the Instaclustr Console.