Instaclustr has updated it’s Cassandra on AWS EBS infrastructure offerings to include m4.xlarge class instances on AWS. Over 100 hours of testing and tuning has demonstrated that these nodes, using EBS provide substantial price/performance benefits for many use cases.
It’s traditional wisdom that Cassandra and AWS EBS don’t mix. However, with the release of the latest generation EBS-optimized instances we started to hear that people were having success using these nodes to run Cassandra. In particular, we’d like to acknowledge a presentation from CrowdStrike at Cassandra Summit and Al Tobey’s Cassandra 2.1 Tuning Guide.
We first started investigating the use of these instance types as a potential offering for customers with large amounts of data but relatively low throughput requirements. However, once we started testing we quickly realised that these new instance types offer better price/performance for many uses cases.
We spent over 100 hours engineering effort benchmarking and tuning these on these instance types, particularly in I/O intensive scenarios. We have also been running EBS-based on nodes on our internal monitoring cluster for over a month and conducted trials with several customers (one of which cause us to re-examine some approaches).
The FAQ that follows provides some more detail on this new offering. Should you have follow-up questions or if you’re an existing customer interested investigating this offering then contact email@example.com.
- Provide improved price/performance for many uses cases against most of our current node sizes.
- Allow for a better range of choices for customers to choose storage to processing capacity ratios that fit their use case (resulting in large savings where there was not a good fit against current offerings).
|m4.l – tiny||250GB|
|Smallest available production node. Use this when getting started. We recommend scaling up to m4.xl rather than scaling out with more m4.large instances.|
|m4.xl – small||400GB|
|Step up from m3.xlarge when more disk required. Starting point for smaller users not ready for m4 balanced offering (lower performance as smaller disk provide less IOPS). Upgrade as you grow.|
|m4.xl – balanced||800GB|
|Best balance of space and performance. Suggested standard building block for most clusters.|
|m4.xl – bulk||1600GB|
|Lowest cost bulk storage for low read ops uses cases.|
|Proven performer – good balance of space and performance. Basis of most of our largest production clusters. Will provide better performance than m4 based nodes for very read-heavy use cases.|
|Lowest cost read performance with relatively small data volumes. Build a cluster with these for extremely high performance to data ratios.|
|May provide an low cost entry point for some uses cases. Has higher throughput than an m4.l-tiny, lower cost (but much smaller disk) than an m4.xl-small.|
Note: we have discontinued the i2.xlarge offering as the m4 based nodes offer a better value for money solution.
I/O Heavy Read and Write – Maximum Throughput
In these tests we aimed to achieve maximum throughput for read and writes with sufficient data to require significant reading from disk.
|Scenario||Operation||3 nodes i2.2xl||6 nodes m4.xlarge|
|Medium||Write||1,363 C* op/sec||1,331 C* op/sec|
|Read||1,818 C* op/sec||1,802 C* op/sec|
|Tiny||Write||36,343 C* op/sec||43,640 C* op/sec|
|Read||24,001 C* op/sec||8,654 C* op/sec|
- Testing was done with a standard C* configuration
- Medium read/write based on table with 32 columns of 2kb each (~64kb per row)
- Tiny read/write is default cassandra-stress schema (< 0.5kb per row)
- Latency results where similar for both instance types (and quite high at these throughput levels)
- Better write performance can be achieved by increasing memtable_flush_writers & concurrent_writes especially on the i2.2xlarges and better read throughput by increasing concurrent_reads and setting HEAP_NEWSIZE to at least a quarter of total heap.
I/O Heavy Read and Write – Latency
In these tests we aimed to read and write at throughput levels when within processing capacity to test relative latency.
|Scenario||Operation||3 nodes i2.2xl|
|6 nodes m4.xlarge|
As the variance in these scenarios shows, actual performance will depend significantly on your data model and application. We highly recommend benchmarking your particular scenario to determine performance characteristics for capacity planning.