Enable Logging for Completed Applications
This article explains how to view historical logs of completed applications. The Instaclustr Spark Console provides the Spark Application UI for running applications, however, once the application is completed, the logs are deleted and only Spark worker logs are accessible.
In order to retain the Application UI for completed applications, you will need to configure the Spark History Server on the machine being used to submit Spark jobs to the cluster.
- An Instaclustr Spark cluster. See step 1 here for a guide on how to create one.
- A Spark Client machine used to submit Spark jobs to the cluster. See steps 2, 3, and 4 here for a guide on how to create and use one.
Spark History Server Configuration
On the Spark Client machine, copy the default configuration template spark-defaults.conf.template to a new file called spark-defaults.conf. This file will be automatically loaded by Spark applications when they are next started.
cp /home/ubuntu/spark-2.1.1-bin-hadoop2.6/conf/spark-defaults.conf.template /home/ubuntu/spark-2.1.1-bin-hadoop2.6/conf/spark-defaults.conf
Next, add the following lines to the spark-defaults.conf file:
See here more information on Spark History Server configuration.
Now create the event log folder:
Start the History Server
Now that configuration is complete, start the Spark History Server from within your Spark installation folderlike so:
When the History Server starts, it will print the location of the log file it will write to in case you want to tail or review it, for example:
Starting org.apache.spark.deploy.history.HistoryServer, logging to /home/ubuntu/spark-2.1.1-bin-hadoop2.6/logs/spark-admin-org.apache.spark.deploy.history.HistoryServer-1-ip-172.19.34.19.out
View the History Server
With the History Server running, browse to http://<Spark client IP>:18080/ and you should see the Spark History Server page displayed:
To see Running (incomplete) applications, click on Show incomplete applications.
Note: The Download button does not work for applications run in client mode. To download logs for an application run in client mode, remove the attempt ID from the URL, for example: http://<host>:18080/api/v1/applications/app-20180709045720-0050/1/logs becomes http://<host>:18080/api/v1/applications/app-20180709045720-0050/logs.