Instaclustr support for AWS I3 Instance Types

Instaclustr delivers improved Apache Cassandra® price/performance with AWS I3 Instance Types.

Instaclustr has released support for a new AWS I3 instance type with our Apache Cassandra, Apache Spark & Elassandra Managed Service.

I3 is the latest generation of AWS’s “Storage Optimised” family. It is designed to support I/O intensive workload, and is backed by low-latency SSD. Instaclustr offers support for the i3.2xlarge instance type, which provides 8 vCPUs, 61 GiB memory, and 1 x 1900 GB of locally attached SSD. As a comparison, the previous generation i2.2xlarge offered 8 vCPUs, 61 GiB memory and 2×800 GB SSDs.

Here’s What We Did

We conducted Cassandra benchmarking of the I3.2xlarge type and compared with results of the previous generation I2.2xlarge. The results of the testing indicate a slight improvement in performance between generations, delivered at a lower price.

Our testing procedure is:

Insert data to fill disks to ~30% full.
Wait for compactions to complete and EBS burst credits to regenerate.
Run a 2 hour test with 500 threads with a mix of 10 inserts : 10 simple queries : 1 range query. Quorum consistency for all operations. You can see the stress spec we used for these tests here:

#
# Instaclustr standard YAML profile for cassandra-stress
# adapted from Apache Cassandra example file
#
# Insert data:
# cassandra-stress user profile=stress-spec.yaml n=25000000 cl=QUORUM ops(insert=1) -node file=node_list.txt -rate threads=100
# Note: n=25,000,000 will produce ~280G of data
# ensure all compactions are complete before moving to mixed load test
#
# Mixed load test
# cassandra-stress user profile=stress-spec.yaml duration=4h cl=QUORUM ops(insert=1,simple1=10,range1=1) -node file=node_list.txt -rate threads=30 -log file=mixed_run2_cms.log
#
#
# Keyspace info
#
keyspace: stresscql2
 
#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
  CREATE KEYSPACE stresscql2 WITH replication = {'class': 'NetworkTopologyStrategy', 'AWS_VPC_US_WEST_2': 3};
 
#keyspace_definition: |
#  CREATE KEYSPACE stresscql2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
 
#
# Table info
#
table: typestest
 
#
# The CQL for creating a table you wish to stress (optional if it already exists)
#
table_definition: |
  CREATE TABLE typestest (
        name text,
        choice boolean,
        date timestamp,
        address inet,
        dbl double,
        lval bigint,
        ival int,
        uid timeuuid,
        value blob,
        PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
  ) WITH COMPACT STORAGE
    AND compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='A table of many types to test wide rows'
#
# Optional meta information on the generated columns in the above table
# The min and max only apply to text and blob types
# The distribution field represents the total unique population
# distribution of that column across rows.  Supported types are
#
#      EXP(min..max)                        An exponential distribution over the range [min..max]
#      EXTREME(min..max,shape)              An extreme value (Weibull) distribution over the range [min..max]
#      GAUSSIAN(min..max,stdvrng)           A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
#      GAUSSIAN(min..max,mean,stdev)        A gaussian/normal distribution, with explicitly defined mean and stdev
#      UNIFORM(min..max)                    A uniform distribution over the range [min, max]
#      FIXED(val)                           A fixed distribution, always returning the same value
#      Aliases: extr, gauss, normal, norm, weibull
#
#      If preceded by ~, the distribution is inverted
#
# Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)
#
columnspec:
  - name: name
    size: uniform(1..1000)
    population: uniform(1..500M)     # the range of unique values to select for the field (default is 100Billion)
  - name: date
    cluster: uniform(20..1000)
  - name: lval
    population: gaussian(1..1000)
    cluster: uniform(1..4)
  - name: value
    size: uniform(100..500)
 
insert:
  partitions: fixed(1)       # number of unique partitions to update in a single operation
                                  # if batchcount > 1, multiple batches will be used but all partitions will
                                  # occur in all batches (unless they finish early); only the row counts will vary
  batchtype: UNLOGGED               # type of batch to use
  select: uniform(1..10)/10       # uniform chance any single generated CQL row will be visited in a partition;
                                  # generated for each partition independently, each time we visit it
 
#
# List of queries to run against the schema
#
queries:
   simple1:
      cql: select * from typestest where name = ? and choice = ? LIMIT 1
      fields: samerow             # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
   range1:
      cql: select name, choice, uid  from typestest where name = ? and choice = ? and date >= ? LIMIT 10
      fields: multirow            # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
   simple2:
      cql: select name, choice, uid from typestest where name = ? and choice = ? LIMIT 1
      fields: samerow             # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

100

# Instaclustr standard YAML profile for cassandra-stress

# adapted from Apache Cassandra example file

# Insert data:

# cassandra-stress user profile=stress-spec.yaml n=25000000 cl=QUORUM ops(insert=1) -node file=node_list.txt -rate threads=100

# Note: n=25,000,000 will produce ~280G of data

# ensure all compactions are complete before moving to mixed load test

# Mixed load test

# cassandra-stress user profile=stress-spec.yaml duration=4h cl=QUORUM ops(insert=1,simple1=10,range1=1) -node file=node_list.txt -rate threads=30 -log file=mixed_run2_cms.log

# Keyspace info

keyspace: stresscql2

# The CQL for creating a keyspace (optional if it already exists)

keyspace_definition: |

CREATE KEYSPACE stresscql2 WITH replication = {'class': 'NetworkTopologyStrategy', 'AWS_VPC_US_WEST_2': 3};

#keyspace_definition: |

# CREATE KEYSPACE stresscql2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

# Table info

table: typestest

# The CQL for creating a table you wish to stress (optional if it already exists)

table_definition: |

CREATE TABLE typestest (

name text,

choice boolean,

date timestamp,

address inet,

dbl double,

lval bigint,

ival int,

uid timeuuid,

value blob,

PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)

) WITH COMPACT STORAGE

AND compaction = { 'class':'LeveledCompactionStrategy' }

AND comment='A table of many types to test wide rows'

# Optional meta information on the generated columns in the above table

# The min and max only apply to text and blob types

# The distribution field represents the total unique population

# distribution of that column across rows. Supported types are

# EXP(min..max) An exponential distribution over the range [min..max]

# EXTREME(min..max,shape) An extreme value (Weibull) distribution over the range [min..max]

# GAUSSIAN(min..max,stdvrng) A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng

# GAUSSIAN(min..max,mean,stdev) A gaussian/normal distribution, with explicitly defined mean and stdev

# UNIFORM(min..max) A uniform distribution over the range [min, max]

# FIXED(val) A fixed distribution, always returning the same value

# Aliases: extr, gauss, normal, norm, weibull

# If preceded by ~, the distribution is inverted

# Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)

columnspec:

- name: name

size: uniform(1..1000)

population: uniform(1..500M) # the range of unique values to select for the field (default is 100Billion)

- name: date

cluster: uniform(20..1000)

- name: lval

population: gaussian(1..1000)

cluster: uniform(1..4)

- name: value

size: uniform(100..500)

insert:

partitions: fixed(1) # number of unique partitions to update in a single operation

# if batchcount > 1, multiple batches will be used but all partitions will

# occur in all batches (unless they finish early); only the row counts will vary

batchtype: UNLOGGED # type of batch to use

select: uniform(1..10)/10 # uniform chance any single generated CQL row will be visited in a partition;

# generated for each partition independently, each time we visit it

# List of queries to run against the schema

queries:

simple1:

cql: select * from typestest where name = ? and choice = ? LIMIT 1

fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

range1:

cql: select name, choice, uid from typestest where name = ? and choice = ? and date >= ? LIMIT 10

fields: multirow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

simple2:

cql: select name, choice, uid from typestest where name = ? and choice = ? LIMIT 1

fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

As with any generic benchmarking results for different data models or application may vary significantly from the benchmark. However, we have found it to be a good test for comparison of relative performance that reflects will in many use cases.

The Results

Our most recent benchmarking used Cassandra 3.11 compared a 3 node i2.2xlarge cluster and a 3 node i3.2xlarge. Driving operations to the point where latency on each node was similar, 3 node i3.2xlarge cluster yielded 13,424 operations/sec, a 31% improvement over the i2.2xlarge while delivering lower latency.

Meanwhile i3.2xlarge is much cheaper than i2.2xlarge. For example, the pricing for i3.2xlarge instance type is 22% less than i2.2xlarge in US East (North Virginia). The significant price reduction between i2 and i3 generations add to the significant improvement in price/performance ratio between generations.

AWS Instance type	ops/sec	median simple read latency (ms)	median range read latency (ms)	median write latency (ms)
i2.2xlarge	10,234	55.1	57.1	31.8
i3.2xlarge	13,424	39.1	40.0	24.3

Table 1: Results summary

Note that the latencies in Table 1 are high because the cluster were pushed roughly to maximum throughput (ops/sec). As always, benchmark results may not translate to meaningful results for real world applications, and we strongly recommend that you do performance testing for your particular use case.

For full pricing, sign up or log on to our console or contact [email protected].

Announcing Instaclustr support for AWS I3 Instance Types

Here’s What We Did

The Results

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter