We are pleased to announce the support for Consumer Lag monitoring for your Kafka clusters running on the Instaclustr managed platform.
Consumer Lag has been one of the most requested metrics from our customers since we have launched our Managed Kafka service just over a year ago. It is one of the key metrics in Kafka that shows how far behind are your consumers in reading messages from Kafka brokers. Consumer Lag is a key performance indicator for applications which use Kafka to stream real-time data where consumers are expected to be reading messages real-time at the rate producers write messages to Kafka brokers, essentially, indicating how real-time is your application and whether you have sufficient processing capacity allocated to your consumers to keep up with incoming data.
However, due to the variable (and potentially large) numbers of consumers and topics to be monitored for each cluster, useful monitoring of consumer lag is not a trivial problem and requires additional processing and logic to make the raw metrics exposed by Kafka useable.
Consumer Group Metrics
We have introduced three new metrics under the monitoring topic of Consumer Group Metrics. Each of these metrics is available at the client level. A client is a logical grouping of a set of consumers having the same Client ID. A consumer group may have one or more clients reading messages from Kafka brokers.
- Consumer Lag Metrics
Consumer Lag is calculated by looking into the difference between the last committed offset by a consumer and the log end offset for a particular partition. It helps investigate and analyze consumer latency issues and isolate clients that are lagging. Additionally, a high consumer group lag indicates that consumers are unable to keep up with the producer throughput which can be monitored to identify when a cluster needs scaling up or a consumer group needs more consumers. - Consumer Count Metrics
This captures the number of consumers per client in a consumer group reading from a specific topic. This data can be used to correlate the analyses with the Consumer Lag metric. Additionally, it shows the health of a consumer group and if all its consumers are alive and reading messages as expected. - Partition Count Metrics
This captures the number of partitions that consumers from a client are reading from a specific topic. A highly imbalanced partition count across clients may indicate a skewed distribution and an underperforming consumer group.
You can get these on the Monitoring page of the Instaclustr Management Console as shown in the image above or through our Monitoring APIs. On the console, you can select a specific topic for a consumer group and one or more clients. You can analyze the data grouped by metric which shows data for all clients in a single graph, one for each consumer group metric or grouped by client which shows data for each client in separate graphs. You can refer to our support documentation for more information – Consumer Group Metrics Documentation. For details on using APIs to get Consumer Group Metrics or other Kafka metrics that Instaclustr monitors and exposes, read here – Monitoring – Kafka Metrics.