Monitoring API

Menu

The monitoring API currently provides the following monitoring information:

  • Long-term cluster health indicators
  • Metrics for:
    • Cassandra status
    • reads and writes operations per second
    • CPU utilization
    • disk utilization
    • pending compactions

Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre.

The API also provides key statistics for each table in the cluster (similar to what is available through “nodetool tablehistograms”):

  • read & write counts (mean, distribution)
  • read & write latency (mean, distribution)
  • live cells & tombstones per read (mean, max)
  • number of sstables read for each read operation (mean, max)

The set of available metrics will expand as we build out this API. Descriptions of each of the metrics can be found in the monitoring section of this support site:
https://www.instaclustr.com/support/documentation/monitoring-information/

Authentication

All requests to the API must use Basic Authentication and contain a valid username and the monitoring API key. API keys are created per user account and can be retrieved via the Instaclustr Console from the Account > API Key tab.

All available metrics are updated every 20 seconds (i.e. requesting the same metric twice in 20 seconds will always return the same response).

Availability

Our APIs are engineered and operated for high levels of availability. However, you should not expect that our APIs have the same level of availability as a managed cluster and you should not build any dependency on the APIs into availability of your service. While we do not provide formal SLAs for the APIs, we aim for 99.95% availability (ie up to ~20 mins downtime per month). Longer maintenance outages may be schedule with appropriate notice. Consequently, any service dependent on the APIs should be able to cope gracefully with periods of unavailability of up to a few minutes. Planned and unplanned outages are communicated via https://status.instaclustr.com/.

Cluster Health Indicator

Cluster Health Indicator API provides a summary of indicators on the long-term health of your cluster and is retrieved by making a GET request to https://api.instaclustr.com/monitoring/v1/clusters/<clusterId>/indicators

The API will respond with status 200 OK and a JSON packet containing the following information:

Example: Response packet showing cluster health

The output JSON consists of:

  • type: The name of the indicator being returned. The API returns five indicator types; REPLICATION_STRATEGY and REPLICATION_FACTOR for each keyspace. DISK_USAGE for each node. PARTITION_SIZE and TOMBSTONE_LIVECELL for every table.
  • stateDetails: The state of the indicator type. stateDetails can be PASS, UNKNOWN, FAIL, WARN with further details provided in the form of a message.

A detailed description of cluster health indicators can be found in this support article:

https://www.instaclustr.com/support/documentation/monitoring-information/cluster-health-check/

API Parameters

Metrics are requested by constructing a GET request, consisting of the following attributes:

classEither ‘clusters’, ‘datacentres’ or ‘nodes’.

  • ‘clusters’: Returns the metrics for each node in the cluster/s.
  • ‘datacentres’: Returns the metrics for each node belonging to the specified data centre/s.
  • ‘nodes’: Returns the metrics for the specific node/s.
UUID or public IPIf the class is set to ‘clusters’ or ‘datacentres’, then the UUID of cluster or datacentre must be specified.

Alternatively, if the class is set to ‘nodes’, then either the nodes’ UUID or public IP may be specified.

metricsThe metrics to return are specified as a comma-delimited querystring parameter. Up to 20 metrics may be specified.

For a complete list of available metrics, refer to the Reference section.

Formatted as: “metrics=<metric_1>,<metric_2>,…”

periodThe period of time from which monitoring information is returned. It is also assigned a period type.

Formatted as: “period=<period>&type=<period type>”

periodperiod type
‘latest’‘aggregate’
‘1m’Returns the most recent monitoring value.NA
’15m’Returns the most recent monitoring value.Returns the average of all monitoring results from 15 minutes ago to now.
‘1h’Returns the most recent monitoring value.Returns the average of all monitoring results from 1 hr ago to now.
‘3h’Returns the most recent monitoring value.Returns the average of all monitoring results from 3 hrs ago to now.
‘1d’Returns the most recent monitoring value.Returns the average of all monitoring results from 1 day ago to now.
‘7d’Returns the most recent monitoring value.Returns the average of all monitoring results from 7 days ago to now.
’30d’Returns the most recent monitoring value.Returns the average of all monitoring results from 30 days ago to now.
reportNaNEither ‘true‘ or ‘false‘.

If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting ‘reportNaN=true’ will return NaN values in the API response.

Formatted as: “reportNaN=<true or false>”

Request format:

https://api.instaclustr.com/monitoring/v1/{{class}}/{{UUID or Public IP}}?{{metrics}}&{{period}}&{{reportNaN}

Examples:

Scenario Relevant Request Format
Return the CPU and disk utilization for each node in the cluster with a UUID of e7342f08-d32f-41af-95be-cfaa0a433a26.https://api.instaclustr.com/monitoring/v1/clusters/e7342f08-d32f-41af-95be-cfaa0a43 3a26?metrics=n::cpuUtilization,n::diskUtilization
Return the latest results of disk utilization for each node in the cluster with a UUID of 10e837bd-47a1-4e39-b7d4-5137e145491d. https://api.instaclustr.com/monitoring/v1/clusters/10e837bd-47a1-4e39-b7d4-5137e145491d?metrics=n::diskUtilization&period=1m&type=latest
Return the average of read and write per second by Cassandra for each node belonging to the datacentre with a UUID of 001224dc-989c-4ad0-8b37-1ce345065b8f, from 15 minutes ago to now.https://api.instaclustr.com/monitoring/v1/datacentres/001224dc-989c-4ad0-8b37-1ce34 5065b8f?metrics=n::cassandraReads,n::cassandraWrites&period=15m&type=aggregate 
Return the list of all read latency distribution values for the ‘tcf1’ table in the ‘tk1’ keyspace, for just the 52.70.191.97 node, from 7 days ago to now, reporting NaN values as well.https://api.instaclustr.com/monitoring/v1/nodes/52.70.191.97?metrics=cf::tk1::tcf1: :readlatencydistribution&period=7d&type=range&reportNaN=true

Successfully processed metric API requests will return a 200 status code and accompanying JSON packet. JSON packets follow the same basic structure as listed in the following example:

e.g. Response with CPU Utilization for a single node

Each payload item represents an individual metric and will consist of:

metricThe name of the metric being returned
typeThe sub-type of the metric that is being measured (e.g. for the diskUsedmetric, the available ‘types’ are livediskspaceused and totaldiskspaceused)
unitThe unit of measurement.  The following unit abbreviations are used:

  • GB: Gigabyte
  • MB: Megabyte
  • B: Byte
  • s: Second
  • ms: Millisecond
  • us: Microsecond
  • 1: Non-standard unit (e.g. percentage)
  • us/1: Microseconds pre non-standard unit (e.g. latency per read operation)
  • 1/s: Non-standard unit per second (e.g. write operations per second)
valuesAn array of time/value maps containing the measurement as recorded by Instaclustr

If multiple metrics are requested, the response will include multiple payload entries:

e.g. Get CPU Utilization and Disk Utilization for a single node

Unsuccessful calls will return the following responses, depending upon the issue:

  • 400 Bad Request: Returned when the expected node or cluster ID is not a valid UUID or an incorrect metric name has been supplied.
  • 401 Unauthorized: Returned when no or incorrect username and/or API key details are provided.
  • 404 Not Found: Returned when accessing an incorrect URL or trying to access a cluster/node not owned by the authenticated user.
  • 415 Unsupported Media Type: Returned when the payload is in an unsupported format. Possibly resolved by specifying content-type as application/json.
  • 429 Too Many Requests: Returned when more than 70 requests per second are being received by your user.
  • 500 Server Error: All other errors

e.g. Error response

 

General Metrics

  • n::nodeStatus: Whether Cassandra is available on the node. Returns a “warn” value, if no checkin has been logged in the last 30 seconds.
  • n::cpuUtilization: Current CPU utilisation as a percentage of total available. Maximum value is 100%, regardless of the number of cores on the node.
  • n::osload: Current OS load. Generally, a node is overloaded if os load >= the number of cores on the node.
  • n::diskUtilization: Total disk space utilisation, by Cassandra, as a percentage of total available.

FREE TRIAL

Spin up a cluster in less
than 5 minutes.
(No credit card required)

Sign Up Now
Close

Site by Swell Design Group