Here at Instaclustr we have spent a lot of time running load tests on Apache Cassandra and dealing with the many different types of performance issues that crop up when running Cassandra in production. Through these exercises, we’ve learnt a lot (often by trial and error) about how to usefully load test Cassandra applications.
Although many of the principles of load testing Cassandra applications are similar to testing relational databases and other NoSQL systems, there are three specific factors which we’ve found to be particularly important when load testing Cassandra.
1. Consider background load
Compaction is a necessary, ongoing background task in Cassandra. Achieving a target write throughput for 15 minutes and then leaving 8 hours of compaction processing is not a useful test as the write throughput is clearly not sustainable for a long period. Similarly, achieving your read target with no write workload (and thus no compactions) is also not useful for the real world.
Repairs and backups can (to some degree) be scheduled outside peak times but many Cassandra applications (such as IOT) have very constant workloads and no lull for scheduling these types of activities.
2. Data profile matters
As we covered in an earlier blog post on the top 3 data modelling traps, well distributed partition keys, careful treatment of updates and deletes to avoid tombstone issues and correct use of secondary indexes are all crucial to Cassandra’s performance. In testing, each of these issues will only be uncovered if your test data profile is well-matched to real production data. For example:
- unique, randomly distributed values for your partition key will generate massive differences in performance to a data set where 99% of values have a single partition key (we’ve seen 99% but even 5% is likely to be very significant);
- the volume of tombstones generated for a 1-2 hour test of updates and deletes may be very different to the volume generated over 10 days of running (the default period for which tombstones are alive). Increasing tombstone volumes can have major impacts on performance.
3. Show cassandra-stress some love
cassandra-stress, as enhanced with Cassandra 2.1, is a fantastically powerful tool for load testing a Cassandra cluster and schema. However, like any powerful tool, it can be used poorly to generate not-very-useful results or used well to generate very accurate simulations. For example, when run with a duration setting, cassandra-stress will use a default limit of 1,000,000 for generating partition keys. However, when running with a ‘number of operations’ setting, the default limit is equal to the number of operations – potentially yielding very different results.
We’ve hit some of the key Cassandra load testing factors here. However, don’t forget to consider normal load testing considerations such as peak loads and bottlenecks. Also, keep an eye on Cassandra tuning techniques, and make sure settings such as driver settings and the consistency level mirror your planned application configuration.