• Apache Cassandra
  • Technical
Apache Cassandra Costs Running in the Cloud

Instaclustr managed Cassandra services include the underlying cloud provider charges in our standard monthly fee. We’re often asked by customers and potential for information about what these charges look like. This information is also useful to people planning their own cloud-based Cassandra implementation. So, whichever category you’re in, read on for more information.

For simplicity, in this post I’ll focus on our AWS deployment, however, similar cost concepts exist across the other cloud support (Azure, IBM SoftLayer) and other providers we’ve taken a close look at.

Background on our deployment

To start off, a quick bit of background about how we deploy Cassandra (which is basically best practice for deploying Cassandra in the cloud):

  • we have some offerings that use local SSD storage and some that use GP2 EBS (General Purpose v2 Elastic Block Storage – a class of AWS network-based storage);
  • we deploy each rack into the cluster in a separate Availability Zone (AZ) (most clusters have three racks);
  • we deploy each cluster into a separate Virtual Private Cloud (VPC) environment and allow access either via a peered VPC and private IP or via public Elastic IPs assigned to each instance;
  • we back up each cluster daily to S3.

 

AWS Costs

With this style of architecture, we generate the following categories of AWS costs:

CostDescriptionDriver
InstancesCost of the base compute instances (eg m4.xl).Number and size of nodes in the cluster.
EBS VolumeCost of attached EBS volumes (where applicable)Size of the EBS volume (eg 400GB)
Network – Public IP In/OutLoading/retrieving data via public IPOnly applicable if accessing via Public IP: dependant on number of Cassandra read/writes in a month and transaction size.
Network – Interzone In/OutCross-availability zone communication within the clusterTransaction volume and size, consistency factor used for reads
Network –

 

VPC In/Out

Loading/retrieving data via a peered VPCOnly applicable if accessing via Peered VPC: dependant on number of Cassandra read/writes in a month and transaction size.
S3 StorageS3 space for storing backupsVolume of data, length of backup retention, deduplication of backup files/data
S3 OperationsS3 calls for storing backupsNumber of sstables (volume of data + compaction strategy), backup strategy
S3 Data Transfer OutS3 retrieval data transfer costOnly applicable if you need to copy data from S3 to a region other than US East to restore a backup.

In most cases, the instance (compute capacity) cost will be the largest cost component. However, for large EBS-based nodes, EBS cost can come close to compute capacity cost. And, for some instance types and usage scenarios, we have seen network costs equal and even exceed the base compute costs. S3 costs are typically not a major component of the overall picture.

Historical data

At Instaclustr, we use the 18 months+ of historical data we have from running Cassandra on the cloud to estimate expected charges in the monthly fee for our managed service. If you are not interested in managed service then our consultants can also access this data to help you plan your own deployment. Contact us for more information.

Whether you’re looking for a complete managed solution, or are in need of enterprise support or consulting services, we’re here to help.
Contact Us