A common question we here from potential customers is “how does Cassandra compare to DynamoDB”? We’re big fans of AWS, in fact most of our business runs on EC2, and certainly believe DynamoDB is a great solution for some use cases. Of course, we also believe that Cassandra is an outstanding solution in a lot of cases.
Different use cases obviously require different solutions, but we see a lot use cases for which Apache Cassandra is a much more cost effective implementation. We have chosen a real use case, our own Instametrics capability, to contrast costs and also the strengths and weaknesses of both solutions.
To summarize, Instametrics allows us to store and analyze monitoring data from the almost 1000 Apache Cassandra nodes that we manage. The key stats are:
- 12 x m4.xl-balanced (800GB) nodes
- Replication Factor 3
- > 40,000 writes/sec (24 x7 consistent load)
- ~ 500 reads/sec consistent load (peaks at 1-2k reads/sec) at consistency level 1
- Small data per read/write
This is our current running load – the cluster is pretty well utilised but we believe we can still push it a bit harder and in particular we plan to increase the levels of read operations.
First of all, let’s get to the bottom line – costs. The Instaclustr price is very straightforward to calculate 12 m4.xl-balanced nodes at $727 each per month on demand is a total, all inclusive cost of $8,724 per month.
DynamoDB costs would consist of a number of components:
- Storage at $0.25 / GB: A Cassandra node can generally be run to about 70% full so 70% x (12 nodes / replication factor 3) * 800GB = 2,240 GB. Subtract 25 Free GB and multiple by $0.25/GB gives you $553/month storage cost.
- Network at $0.09 / GB network out (free in): 500 small reads/sec adds up to something like 12GB per month and you get one GB free. So, for our use case this comes to a whopping $0.99 / month. However, our use case currently has an extremely low level of reads and a higher level of reads could quickly make this a significant cost.
- Read Throughput at $0.0065 per hour for every 50 units of Read Capacity. According to AWS, this corresponds to 360,000 reads per hour (eventual consistency) so that’s 100 reads/sec. Monthly read capacity cost (ignoring bursts) would therefore be 5 x 0.0065 x 720 (hours in the month) = $23.40.
- Write Throughput at $0.0065 per hour for every 10 units of Write Capacity. According to AWS, this corresponds to 36,000 writes per hour (eventual consistency) so that’s 10 writes/sec. Monthly write capacity cost (ignoring bursts) would therefore be 4000 x 0.0065 x 720 = $18,720.
The total on-demand DynamoDB cost for our use case would therefore be $19,298 – quite a difference to the $8,724 for using Instaclustr. Of course, this reflects the specific use case we have with Instametrics – different balances of read, writes and storage space can have quite different results. However, for most examples we find that Managed Cassandra will be cheaper than DynamoDB for any significant, consistent workload level. The other cost to consider with DynamoDB is support. The support included in the Instaclustr price for a cluster like Instametrics is fairly equivalent to AWS Enterprise level support. The minimum charge for this level of support from AWS is $15,000 for less than $150k total monthly usage, 7% of monthly usage costs due from $150k-$500k and going down from there. Let’s assume you’re a big but not massive customer and call this 7%, taking our total AWS bill up to $20,649.
So, we’ve shown that for this use case, there are significant savings to be had running Cassandra vs DynamoDB. Are there other advantages to using Managed Cassandra?
Glad you asked, there are a range of functional differences between the two. However, the main things we’d highlight are:
- Using an open, portable technology – with DynamoDB, you’re 100% locked into AWS. Apache Cassandra is fully open source so you know you can always bring management in house or move to a different managed service provider.
- Querying your data with a SQL-like language rather than a proprietary API – lower learning curve and more readable code in many circumstances.
- Cassandra provides more sophisticated data modelling options such as user defined types, JSON support, and (in the latest versions) materialized views.
- Cassandra provides native ability to isolate analytic workloads (e.g. Spark) from OLTP while transparently maintaining data replication.
Of course, we’re a bit biased toward Apache Cassandra so what are some of the reasons you’d want to use DynamoDB? For us, the clearest use-case is where you need to rapidly scale capacity up and down. DynamoDB clearly has some sophisticated magic behind the scenes to allow this and it can change the economics dramatically in DynamoDB’s favour if you have have a very variable capacity requirement that your can predict in advance (for example running large analytics batch jobs).
The other clear reason you’d choose DynamoDB is if you’ve made a decision to go all-in with AWS services. Clearly, you can expect a sophisticated level of integration between AWS services and should also get single vendors support with less chance of issues falling between the cracks.
As we said, this example follows a particular use-case. If you are interested in exploring your use-case in more detail the please get in touch via firstname.lastname@example.org.