Getting started with Spark Jobserver and Instaclustr

Menu

Spark Jobserver is an open source project available on GitHub . Spark Jobserver provides a simple, secure method of submitting jobs to Spark without many of the complex set up requirements of connecting to the Spark master directly. You can submit jobs, contexts and JARs to the Jobserver using a RESTful interface. More information is available from the Spark Jobserver GitHub page.

Connecting to the Jobserver web UI

– In the Instaclustr console, go to the cluster’s ‘Settings’ page and add your IP address to the Spark Jobserver allowed addresses list.
– Visit the cluster’s ‘Connection Info’ page to find the login details (under ‘Credentials for API key authentication’).
– Go to the cluster’s ‘Details’ page to find out the IP address of the node hosting the jobserver. (In the ‘Status’ column, one of the nodes will list ‘Spark Jobserver’).
– In your browser, visit port 8090 of the jobserver node (https://<NODE_PUBLIC_IP_ADDRESS>:8090/). When asked to authenticate, provide the login details from the Connection Info page.
– The Web UI will appear. This provides information on jobs, contexts, and binaries.

Making REST requests (to submit jobs, view status of jobs, etc)

Background information

The following examples are based on the WordCountExample walkthrough from the Spark Jobserver GitHub page. The steps below assume that you have already used sbt to package the test jar. In the examples below the test jar is called job-server-tests_2.11-0.9.0-SNAPSHOT.jar.

You can make REST requests to the jobserver via the public endpoint, or via the private IP. Unless you have specific requirements, the public endpoint is an easier option. Steps for both methods are provided below.

When you connect, our server will offer a certificate signed by a root CA, and SSL will be used to identify the server and encrypt traffic. This works automatically when you connect to the public endpoint, as long as you trust Let’s Encrypt as a CA (which most systems do by default).

If you try to use SSL when connecting via private IP, domain verification will fail. Since the private IP is not known at the time of certificate generation, the certificate doesn’t contain the private IP of the jobserver node. You can still connect via private IP: add a new entry in the hosts file so the public endpoint resolves to the private IP (as described in the steps below). You could also skip domain verification (eg with CURL, by adding the --insecure flag), but this would only be recommended for development purposes.

Connect to jobserver via the public endpoint

– On the cluster’s ‘Settings’ page, ensure your IP address is in the Spark Jobserver allowed addresses list.
– On the cluster’s ‘Connection Info’ page, find the URL for the public endpoint (a URL which looks like https://spark.73b08c5e892c4c8fae0815c1fb50.cu.dev.instaclustr.com:8090, for example).
– On the ‘Connection Info’ page, find the credentials needed to connect (under ‘Credentials for API key authentication’).
– Curl the public endpoint, as follows:

curl --user <USERNAME>:<PASSWORD> -vX POST <PUBLIC_ENDPOINT>:8090/binaries/test -H "Content-Type: application/java-archive" --data-binary @job-server-tests_2.11-0.9.0-SNAPSHOT.jar

Connect to jobserver on a node connected via VPC peering

  • On the cluster’s ‘Settings’ page, go to the ‘VPC peering’ section. Ensure your VPC peering connection is active.
  • On the cluster’s ‘Settings’ page, ensure the private IP address of the machine you are connecting from is in the Spark Jobserver allowed addresses list. (You can specify a CIDR range.)
  • On the cluster’s ‘Details’ page, find out the private IP address of the node hosting the jobserver. (In the ‘Status’ column, one of the nodes will list ‘Spark Jobserver’).
  • On the ‘Connection Info’ page, find the credentials needed to connect (under ‘Credentials for API key authentication’).
  • Confirm that you are able to reach the Jobserver port (eg, on most Linux systems, netcat -w 2 -z <JOBSERVER_PRIVATE_IP> 8090; echo $?) should return 0.
  • On the client machine you will be connecting from, edit your hosts file (on most Linux systems, this is /etc/hosts). In this file, add the private IP and public endpoint of your jobserver:

10.224.126.213 spark.73b08c5e892c4c8fae0815c1fb50.cu.dev.instaclustr.com # Example only

  • Connect by passing in the public endpoint (due to the entry in the hosts file, this will actually resolve to the private IP):

curl --user <USERNAME>:<PASSWORD> -vX POST https://<PUBLIC_ENDPOINT>:8090/binaries/test -H "Content-Type: application/java-archive" --data-binary @job-server-tests_2.11-0.9.0-SNAPSHOT.jar

If you have any queries, please contact support@instaclustr.com.

Site by Swell Design Group