Unify Spark Configuration for Jobs service and Jupyter


Both Jupyter and Job configuration produces a Spark Configuration. In order to run a Jupyter Notebook in Jobs using the same configuration used previously in Jupyter several sizeable code changes need to be done.

  1. The current SparkConfiguration format does not conform to SparkConfiguration properties (Fixed by HOPSWORKS‌-733)

  2. Notebooks should be runnable in the Jobs service (Fixed by HOPSWORKS‌-752)

  3. Unify SparkConfiguration creation process (UI + Backend) including these steps (HOPSWORKS‌-716):

    1. Jupyter saves the configuration of the notebook in several columns in a table called jupyter_settings (including SparkConfig), whereas Jobs saves SparkConfig as a JSON in a column in the jobs table. proposed solution: Remove jupyter_settings table and use JSON instead, ideally SparkConfiguration (POJO) which is being used in Job service, it can then added as a column to jupyter_project table.

    2. Move jupyter specific configuration such as base_dir, shutdown level etc to the jupyter_project table from jupyter_settings table.

    3. Refactor Jupyter UI to look similar to Job service. Minus fields Jobname, Jobtype, Appfile, Jobdetails. "Configure and create" will be expanded in both Jupyter and Job service. The same SparkConfig should be configurable in the same “recognizable” UI.

    4. Uniform SparkConfiguration creation module for backend (input SparkConfig JSON from frontend, output JobConfiguration POJO‌*)*

After this has been done, we need to associate the SparkConfiguration used for running a notebook to the Notebook file itself in HDFS. Proposed solution presented in HOPSWORKS‌-869.







Fix versions