Instaclustr Spark with SSL configured Cassandra Cluster

Menu

A common setup for a Cassandra cluster is to enable client encryption. In order to utilize Spark with these clusters, additional steps must be taken when submitting jobs to configure the Spark Cassandra connector to use SSL. In this guide, we will go through these steps and attempt to clarify the configuration properties used.

As a prerequisite to this guide, the user should have provisioned and configured a cluster with both Cassandra and Spark. You can find the details on how to do this in sections 1, 2 and 3 of the following article. Getting Started with Instaclustr Spark & Cassandra. Remember to select Client ⇄ Node Encryption to enable client encryption when creating the cluster. This option is not available for Developer node size, so you must select a Production node size.

Download Truststore File

You will need to download the Certificates for the cluster from the Connection info page for your cluster.

In the downloaded zip, you will find a Java Key Store file called truststore.jks. This file needs to be included as a resource in the assembled jar in a later step.

Creating and Submitting a Scala Job with SSL Cassandra Connection

In this step of the tutorial, we will demonstrate how to build and submit a Scala job. This is useful where you wish to create a job and submit it multiple times.

  1. Log in to your Spark client machine
  2. Create required directories for your project:
  3. Create a file called build.sbt in the cassandra-count directory with the following contents (note: the blank lines are important):
  4. Create a file called assembly.sbt in the cassandra-count/project directory with the following contents (this will include required dependencies in the output jars):
  5. Create a file called cassandra-count.scala in the cassandra-count/src/main/scala directory with the following contents:
  6. In order for Spark to connect to Cassandra using SSL, an appropriate SSL Context needs to be created on the Spark driver and all the executors. This is achieved via providing SSL specific properties to the Spark Cassandra connector. Using the default factory the path to the truststore file needs to be valid for the driver and executors. This can be restrictive. An alternative is to create a custom connector. Next, we are going to create a custom Cassandra connection class which treats the trust store path property as a resource path rather than a file path. This allows the reading of the trust store from a resource inside the assembled jar. Create a file called CustomCassandraConnectionFactory.java in the cassandra-count/src/main/java directory with the following contents:
  7. Copy the trust store file downloaded in the earlier step to the cassandra-count/src/main/resources directory.
  8. Additional Properties are needed to set up the connection for the SSL connection to Cassandra
    Property Name Description
    spark.cassandra.connection.ssl.enabled A boolean switch to indicate whether the connection to Cassandra should use SSL
    spark.cassandra.connection.ssl.trustStore.password The password matching the Trust Store
    spark.cassandra.connection.ssl.trustStore.path/td> The path to the trust store file. With the Custom Factory in this example, this is a path to a resource instead
    spark.cassandra.connection.factory  For overriding the behaviour of the default Spark Cassandra Connector. When used it should be the name of the class that implements CassandraConnectionFactory. Details of this class can be found at the DataStax Spark Cassandra Connector page at GitHub

    Create a file called cassandra-count.conf in the cassandra-count directory (this file contains the configuration that will be used when we submit the job):

  9. Build the job (from cassandra-count directory):
  10. Submit the job (from cassandra-count directory):
  11. You should see a lot of log messages with the row count message about 15 messages from the end. And you should see this output:

Using Spark Shell

Connecting to Cassandra via SSL when using Spark Shell is achieved in the same fashion as Spark Submit. The jar containing the custom connection factory and trust store resource must be added to the list of jar files. The same configuration properties used to set up the context for the SSL connection must also be specified. Below is an example Spark Shell Command:

Further Resources

You can find the source code used in this guide at this GitHub page.

Site by Swell Design Group