What are virtual racks?
In Apache Cassandra, a virtual rack is an identifier which sorts nodes into logical groups. Cassandra then distributes data and data-replicas between these groups.
Best practice when assigning racks in Cassandra is to align them with physical regions, such as Amazon Web Services (AWS) availability zones or Google Cloud Platform (GCP) zones (from here we will refer to these regions as physical racks).
In conjunction with replication factor, racks are used by Cassandra such that if a rack becomes non-responsive (network or power outage), the data is still available in other racks.
This is a core tenet of how Cassandra remains fault tolerant.
|In the following examples, we will assume customers are reading and writing data at the quorum consensus level.|
How Instaclustr uses racks
Previously when provisioning a Cassandra cluster, Instaclustr would distribute the nodes across the available physical racks for the Service Provider and region you selected.
As an example, AWS region US WEST 2 has three availability zones, us-west-2a, us-west-2b & us-west-2c.
Cassandra would operate using three racks which mirror the physical availability zones (or physical racks).
Customer keyspaces that operate under a replication factor of 3 could be assured that if us-west-2a lost connectivity, a majority of replicas for their data would still be available and data consistency could be maintained.
Limitations of physical racks
Using physical racks is a great way to ensure fault tolerance in Cassandra.
However in regions with only 2 physical racks, regular maintenance operations are made difficult. If we operate on a rack at a time, only 1 rack remains and data consistency can’t be assured during that period.
Instaclustr has introduced the concept of virtual racks for all supported GCP and AWS regions to better support customers in smaller regions or with higher replication factor requirements.
Now when creating a cluster, a customer can nominate their targeted replication factor. Supported replication factors are 2, 3 & 5; depending on the number of physical racks available in the selected region.
Instaclustr will provision a cluster with the same number of racks to satisfy the selected replication factor. If there are not enough physical racks available, Instaclustr will employ the use of virtual racks.
A virtual rack is still assigned to a physical rack in the selected region, but Cassandra is configured to treat nodes allocated to these racks as a separate rack.
For example, let’s take GCP region US West (Oregon).
This region currently has two physical racks: us-west1-a & us-west1-b.
A customer may now select a target replication factor of 3, and a cluster will be provisioned using the following three racks as seen by Cassandra:
us-west1-a, us-west1-b & us-west1-a-ic.
Nodes in the us-west1-a-ic rack still physically reside in the us-west1-a rack but allows our support engineers to service, upgrade or troubleshoot the cluster without downtime.
In the event that us-west1-a fails, the cluster would still have the data available in us-west1-b but data consistency may not be assured for that period.
All existing clusters will keep their current configuration, there is no change to currently operating clusters.
If you wish to take advantage of virtual racks on an existing cluster, our support team can provision a new data centre with the desired configuration and perform a data migration, and optionally decommission the old data centre.
Contact [email protected] for more information about the migration process.
With the introduction of virtual racks, Instaclustr continues to give customers increased control on the configuration and operation of their Apache Cassandra managed service.