• Apache Cassandra
  • Technical
Hello Cassandra! A Java Client Example

This is the third (and final) part of my blog-series on creating a demonstration Cassandra cluster, connecting, and communicating. We landed on the moon and made Second Contact with the Monolith (CQL shell) in our last blog, but what can we do to understand the Monolith better? Let’s explore Cassandra Java client program.

Java Client Driver

Apart from the CQL shell, another way of connecting to Cassandra is via a programming language driver. I’ll use Java. Instaclustr has a good introduction to Cassandra and Drivers, including best practices for configurations. We recommend the DataStax driver for Java which is available under the Apache license as a binary tarball.  Unpack it and include all the jar files in your Java libraries build path (I use Eclipse so I just had to import them). The driver documentation is here, and this is a good summary.

Connection

From the Instaclustr console, under your trial cluster, Connection Info tab, at the bottom are code samples for connecting to the trial cluster with pre-populated data.

Cassandra Java Client example

This is a simplistic code example of connecting to the trial Cassandra cluster, creating a time series data table, filling it with realistic looking data, querying it and saving the results into a csv file for graphing (Code below). To customise the code for your cluster, change the public IP addresses, and provide the data centre name and user/password details (it’s safest to use a non-super user).  The Cluster.builder() call uses a fluent API to configure the client with the IP addresses, load balancing policy, port number and user/password information. I’ve obviously been under a rock for a while as I havn’t come across fluent programming before. It’s all about the cascading of method invocations, and they are supported in Java 8 by Lambda functions (and used in Java Streams). This is a very simple configuration which I’ll revisit in the future with the Instaclustr recommended settings for production clusters.

The program then builds, gets meta data and prints out the host and cluster information, and then creates a session by connecting.   You have the option of dropping the test table if it already exists or adding data to the existing table.

Next, we fill the table with some realistic time series sensor data. You can change how many host names (100 by default) are used, and how many timestamps are generated. For each time 3 metrics and values will be inserted. There are several types of statements in the Java clients including simple and prepared statements. In theory prepared statements are faster so there’s an option to use either in the code. In practice it seems that prepared statements may not improve response time significantly but may be designed to improve throughput.  Realistic looking data is generated by a simple random walk.

The code illustrates some possible queries (SELECTs), including a simple aggregate function (max) and retrieving all the values for one host/metric combination, finding all host/metric permutations (to assist with subsequent queries as we made the primary key a compound key of host and metric so both are needed to select on), and finally retrieving the whole table and reporting the number of rows and total bytes returned.

What does the data look like?

The simplest possible way of taking a better look at the data was to use the cqlsh again and run this command to produce a CSV file:

COPY hals.sensordata TO ‘../test1.csv’ WITH header=true;

You can then read the csv file into excel (or similar) and graph (for example) all the metric values over time for a selected host:

Sample Sensor data values over time for 1 host and 3 metrics - Instaclustr

Next blog: A voyage to Jupiter: Third Contact with a Monolith—exploring real Instametrics data.

Don’t have a trial cluster?

Start for Free