• Apache Cassandra
  • Technical
Pragmatic Availability with Apache Cassandra

Cassandra provides innovative and pragmatic approaches to balancing your availability needs, application functionality and infrastructure costs. As a result you can achieve higher levels of availability at a lower cost.

One of the first things you’ll read when learning about Cassandra is that it is designed to provide high availability. This is absolutely true and Cassandra absolutely provides high availability in the traditional sense of having multiple servers capable of fulfilling a function and the ability for the service to seamlessly continue to operate in the event of the failure of one or more servers.

However, one of the interesting and powerful things about Cassandra is that it allows you to tune its availability rules in a way that is pragmatic for your application. With a tradition RDBMS, availability is generally an all or nothing scenario – either the service is in a state where every operation will succeed or it is in a state where every operation with fail. With Cassandra, you can configure the definition of available for your use case and how protected you want to be in meeting that definition. In a partial failure scenario, Cassandra will continue to operate for the subset of operations that meet your definition of available.

Cassandra achieves this tuneable availability through two key concepts: consistency levels and replication factors. The replication factor of a Cassandra keyspace (a keyspace is roughly equivalent to a database in an RDBMS) tells Cassandra how many copies of each piece of data to keep. As well as how many copies, you can configure how the copies are distributed (for example, across availability zones or data centers).

The consistency level can be specified for each connection or query and defines how many copies of the data must be successfully read or written before the operation can succeed. In many cases, applications use a consistency level of ‘quorum’ which means that the majority of replicas must succeed for the operation to succeed and guarantees that all reads will include the results of all previous writes. In this case where you want lower cost high availability and your application can live with reads occasionally missing a write, you can use even lower consistency levels than quorum.

This means, for example, that if you want to ensure 100% availability of your Cassandra service in the event that two servers fail and to guarantee all writes are include in all reads, you would need at least a five node cluster (with a replication factor of 5 and quorum consistency meaning three replicas must succeed for the operation to succeed). If you are prepared to sacrifice read consistency, 100% availability in the event of two servers failing can also be achieved with a three node cluster (with a replication factor of 3 and consistency level of 1 meaning only one replica has to be available for the operation to succeed).

Where Cassandra gets really pragmatic in availability is in situations where the replication factor is less than the number of nodes. For example, consider a case with six nodes in a cluster and a replication factor of 3. In this case, when two nodes fail, one third of rows will include replicas on both of those nodes. (For Cassandra techies, I’m assuming a simple replication strategy here – things are a bit more complex with a network topology strategy and the impact depends on which two nodes fail.) In this case, Cassandra will allow the two-thirds of single-row operations that have at least two replicas available to continue to succeed. Only the operations on rows that are unlucky enough to have two replicas on the failed nodes will fail (and the client has the option of retrying at a lower consistency level). For many application scenarios, this is a big improvement over the all or nothing approach of a traditional RBDMS.

Put together, all this means that Cassandra gives you fantastic ability to pragmatically balance your application functionality, availability requirements and infrastructure spend and achieve your real availability requirements at the lowest possible cost. At Instaclustr, we’re always happy to work with our customers to help them understand these trade-offs in detail and come to a solution that meets their unique requirements.