The last major version release of Apache Cassandra was 3.11.0 and that was more than 2 years ago in 2017. So what has the Cassandra developer community been doing over the last 2 years? Well let me tell you, it’s good real good. It’s Apache Cassandra 4.0! It’s also close, and with the release of the first alpha version, we now have a pretty solid idea of the features and capabilities that will be included in the final release.
In this series of blog posts, we’ll take a meandering tour of some of the important changes, cool features, and nice ergonomics that are coming with Apache Cassandra 4.0. In part 1 we focussed on Cassandra 4.0 stability and testing, in this part we will learn about Virtual Tables.
Implementation of Virtual Tables
Among the many exciting new features, Cassandra 4.0 boasts is the implementation of Virtual Tables. Up until now, JMX access has been required for revealing Cassandra details such as running compactions, metrics, clients, and various configuration settings. With Virtual Tables, users will be able to easily query this data as CQL rows from a read-only system table. Let’s briefly discuss the changes associated with these Virtual Tables below.
Previously if a user wanted to look up the compaction status of a given node in a cluster, they would first require a JMX connection to be established in order to run
nodetool compactionstats on the node. This alone presents a number of considerations: configuring your client for JMX access, configuring your nodes and firewall to allow for JMX access, and ensuring the necessary security and auditing measures are in place, just to name a few.
Virtual Tables eliminate this overhead by allowing the user to query this information via the driver they already have configured. There are two new keyspaces created for this purpose:
system_virtual_schema keyspace is as it sounds; it contains the schema information for the Virtual Tables themselves. All of the pertinent information we want is housed in the
system_views keyspace which contains a number of useful tables.
cqlsh> select * from system_virtual_schema.tables;
keyspace_name | table_name | comment
system_views | caches | system caches
system_views | clients | currently connected clients
system_views | coordinator_read_latency |
system_views | coordinator_scan_latency |
system_views | coordinator_write_latency |
system_views | disk_usage |
system_views | internode_inbound |
system_views | internode_outbound |
system_views | local_read_latency |
system_views | local_scan_latency |
system_views | local_write_latency |
system_views | max_partition_size |
system_views | rows_per_read |
system_views | settings | current settings
system_views | sstable_tasks | current sstable tasks
system_views | thread_pools |
system_views | tombstones_per_read |
system_virtual_schema | columns | virtual column definitions
system_virtual_schema | keyspaces | virtual keyspace definitions
system_virtual_schema | tables | virtual table definitions
Before looking at an example, it’s important to touch upon the scope of these Virtual Tables. All Virtual Tables are restricted in scope to their node, and therefore all queries on these tables return data valid only for the node acting as coordinator regardless of consistency. As a result, support for specifying the coordinator node for such queries has been added to several drivers including the Python and Datastax Java drivers.
Let’s take a look at a Virtual Table, in this case
sstable_tasks. This table shows all operations on SSTables such as compactions, cleanups, and upgrades.
cqlsh> select * from system_views.sstable_tasks;
keyspace_name | table_name | task_id | kind | progress | total | unit
keyspace1 | standard1 | 09e00960-064c-11ea-a48a-87683fec5884 | compaction | 15383452 | 216385920 | bytes
This is the same information we would expect out of running
nodetool compactionstats. We can see that there is currently one active compaction on the node, what its progress is, as well as its keyspace and table. Being able to quickly and efficiently view this information is often key in understanding and diagnosing cluster health.
While there are still some metrics with which JMX is the only means of querying, having the ability to use CQL to pull important metrics on a cluster is a very nice feature. With Virtual Tables offering a convenient means of querying metrics less focus needs to be placed on building JMX tools, such as Reaper, and more time can be spent working within Cassandra. We may start to see a rise in client-side tooling that takes advantage of Virtual Tables as well.