Apache Cassandra 4.0 – Virtual Tables

The last major version release of Apache Cassandra was 3.11.0 and that was more than 2 years ago in 2017. So what has the Cassandra developer community been doing over the last 2 years? Well let me tell you, it’s good real good. It’s Apache Cassandra 4.0! It’s also close, and with the release of the first alpha version, we now have a pretty solid idea of the features and capabilities that will be included in the final release.

In this series of blog posts, we’ll take a meandering tour of some of the important changes, cool features, and nice ergonomics that are coming with Apache Cassandra 4.0. In part 1 we focussed on Cassandra 4.0 stability and testing, in this part we will learn about Virtual Tables.

Implementation of Virtual Tables

Among the many exciting new features, Cassandra 4.0 boasts is the implementation of Virtual Tables. Up until now, JMX access has been required for revealing Cassandra details such as running compactions, metrics, clients, and various configuration settings. With Virtual Tables, users will be able to easily query this data as CQL rows from a read-only system table. Let’s briefly discuss the changes associated with these Virtual Tables below.

Previously if a user wanted to look up the compaction status of a given node in a cluster, they would first require a JMX connection to be established in order to run nodetool compactionstats on the node. This alone presents a number of considerations: configuring your client for JMX access, configuring your nodes and firewall to allow for JMX access, and ensuring the necessary security and auditing measures are in place, just to name a few.

Virtual Tables eliminate this overhead by allowing the user to query this information via the driver they already have configured. There are two new keyspaces created for this purpose: system_views and system_virtual_schema. The system_virtual_schema keyspace is as it sounds; it contains the schema information for the Virtual Tables themselves. All of the pertinent information we want is housed in the system_views keyspace which contains a number of useful tables.

cqlsh> select * from system_virtual_schema.tables;

 keyspace_name         | table_name                | comment
-----------------------+---------------------------+------------------------------
          system_views |                    caches |                system caches
          system_views |                   clients |  currently connected clients
          system_views |  coordinator_read_latency |                             
          system_views |  coordinator_scan_latency |                             
          system_views | coordinator_write_latency |                             
          system_views |                disk_usage |                             
          system_views |         internode_inbound |                             
          system_views |        internode_outbound |                             
          system_views |        local_read_latency |                             
          system_views |        local_scan_latency |                             
          system_views |       local_write_latency |                             
          system_views |        max_partition_size |                             
          system_views |             rows_per_read |                             
          system_views |                  settings |             current settings
          system_views |             sstable_tasks |        current sstable tasks
          system_views |              thread_pools |                             
          system_views |       tombstones_per_read |                             
 system_virtual_schema |                   columns |   virtual column definitions
 system_virtual_schema |                 keyspaces | virtual keyspace definitions
 system_virtual_schema |                    tables |    virtual table definitions

cqlsh> select * from system_virtual_schema.tables;

keyspace_name | table_name | comment

-----------------------+---------------------------+------------------------------

system_views | caches | system caches

system_views | clients | currently connected clients

system_views | coordinator_read_latency |

system_views | coordinator_scan_latency |

system_views | coordinator_write_latency |

system_views | disk_usage |

system_views | internode_inbound |

system_views | internode_outbound |

system_views | local_read_latency |

system_views | local_scan_latency |

system_views | local_write_latency |

system_views | max_partition_size |

system_views | rows_per_read |

system_views | settings | current settings

system_views | sstable_tasks | current sstable tasks

system_views | thread_pools |

system_views | tombstones_per_read |

system_virtual_schema | columns | virtual column definitions

system_virtual_schema | keyspaces | virtual keyspace definitions

system_virtual_schema | tables | virtual table definitions

Before looking at an example, it’s important to touch upon the scope of these Virtual Tables. All Virtual Tables are restricted in scope to their node, and therefore all queries on these tables return data valid only for the node acting as coordinator regardless of consistency. As a result, support for specifying the coordinator node for such queries has been added to several drivers including the Python and Datastax Java drivers.

Let’s take a look at a Virtual Table, in this case sstable_tasks. This table shows all operations on SSTables such as compactions, cleanups, and upgrades.

cqlsh> select * from system_views.sstable_tasks;

 keyspace_name | table_name  | task_id                              | kind       | progress | total     | unit
---------------+-------------+--------------------------------------+------------+----------+-----------+-------
     keyspace1 |  standard1  | 09e00960-064c-11ea-a48a-87683fec5884 | compaction | 15383452 | 216385920 | bytes

cqlsh> select * from system_views.sstable_tasks;

---------------+-------------+--------------------------------------+------------+----------+-----------+-------

This is the same information we would expect out of running nodetool compactionstats. We can see that there is currently one active compaction on the node, what its progress is, as well as its keyspace and table. Being able to quickly and efficiently view this information is often key in understanding and diagnosing cluster health.

While there are still some metrics with which JMX is the only means of querying, having the ability to use CQL to pull important metrics on a cluster is a very nice feature. With Virtual Tables offering a convenient means of querying metrics less focus needs to be placed on building JMX tools, such as Reaper, and more time can be spent working within Cassandra. We may start to see a rise in client-side tooling that takes advantage of Virtual Tables as well.

Understanding Cassandra 4.0

Cassandra 4.0 brings improvements in performance, stability, and so much more. Get started today with the most stable release of Cassandra ever!

Learn more

Apache Cassandra 4.0 – Virtual Tables

Implementation of Virtual Tables

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter