By Instaclustr Thursday 3rd January 2019

Advanced Node Replace

Technical — Cassandra

Instaclustr has a number of internal tools and procedures that help us keep Cassandra clusters healthy. One of those tools allows us to replace instance backing a Cassandra node while keeping the IP’s and data. Dynamic Resizing released in June 2017 uses technology developed for that tool to allow customers to scale Cassandra clusters vertically based on demand.

Initially the replace tool operated by detaching the volumes from the instance being replaced, then re-attaching the volumes to the new instance. This limited the usage of the tool to EBS-backed instances. Another often-requested extension to the tool was resizing a Data Centre to a different node size to upgrade to a newly added node size, for example, or to switch over to the resizable class nodes.

One option for changing instance size where we could not just detach and reattach data volumes was to use Cassandra’s native node replace functionality to replace each instance in the cluster in a rolling fashion. At first, this approach seems attractive and can be conducted with zero downtime. However, quite some time ago we realised that, unless you run a repair between each replacement, this approach has almost certain loss of a small amount of data if any replace operation exceeds the hinted hand-off window. As a result, we relied for quite a while on fairly tedious and complex methods of rolling upgrades involving attaching and re-attaching EBS volumes.

To address this problem, we have recently extended the replace tool to remove these limitation and a support the advanced use case. The new “copy data” replace mode replaces a node in the following stages:

    1. Provision the new node of desired size
    2. Copy most of data from the old node to the new node
    3. Stop the old node ensuring so that no data is lost
    4. Join the replacement node to the cluster

Provisioning is trivial with our powerful provisioning system, but copying large amounts from a live node presents some specific challenges.  We had to develop a solution which was able to copy large amounts of data from a live node without created too much additional load on a cluster which might already be under stress.  We also had to work carefully within constraints created by Cassandra’s hinted handoff system.

We explored a number of solutions to the problem of copy data to the new node while minimising impact to the running nodes.  After discarding several alternatives, settled on a solution which builds on Instaclustr’s existing, proven backup/restore system.  This ensures minimal resource strain on the node being replaced as we only need to copy data added since the last backup was taken and most of the data is already stored in the cloud storage.

Stopping the old node ensuring no data is lost requires stopping Cassandra and uploading the last remaining bit of data added since the previous step. This process usually completes within 10 minutes to ensure minimal degradation of cluster performance.

After all of the data is on the new node the old node is terminated, its public and private IP’s are transferred to the new node and Cassandra is started on the new node. As the replacement node joins it receives the data it missed during the short downtime as hinted handoffs.

The new solution has allowed us to standardise our approach to node replacement for all instance types using the proven technology of our Cassandra backup system to improve the overall performance of the process. At the moment this resize functionality is controlled by our administrators and can be requested by customers via our support channel. We will like make the functionality available directly to users in the future.

Site by Swell Design Group