Skip to content

Commit

Permalink
Merge pull request #279 from microsoft/master
Browse files Browse the repository at this point in the history
merge master
  • Loading branch information
SparkSnail authored Dec 11, 2020
2 parents 4232fea + 5bf5e46 commit cb9efcc
Show file tree
Hide file tree
Showing 275 changed files with 19,399 additions and 269 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ typings/
__pycache__
build
*.egg-info
.eggs/
setup.pye
**/__init__.pye
**/.ipynb_checkpoints
Expand Down
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

**NNI (Neural Network Intelligence)** is a lightweight but powerful toolkit to help users **automate** <a href="docs/en_US/FeatureEngineering/Overview.md">Feature Engineering</a>, <a href="docs/en_US/NAS/Overview.md">Neural Architecture Search</a>, <a href="docs/en_US/Tuner/BuiltinTuner.md">Hyperparameter Tuning</a> and <a href="docs/en_US/Compression/Overview.md">Model Compression</a>.

The tool manages automated machine learning (AutoML) experiments, **dispatches and runs** experiments' trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in **different training environments** like <a href="docs/en_US/TrainingService/LocalMode.md">Local Machine</a>, <a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a>, <a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a>, <a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a>, <a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a>, <a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a>, <a href="docs/en_US/TrainingService/AMLMode.md">AML (Azure Machine Learning)</a> and other cloud options.
The tool manages automated machine learning (AutoML) experiments, **dispatches and runs** experiments' trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in **different training environments** like <a href="docs/en_US/TrainingService/LocalMode.md">Local Machine</a>, <a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a>, <a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a>, <a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a>, <a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a>, <a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a>, <a href="docs/en_US/TrainingService/AMLMode.md">AML (Azure Machine Learning)</a>, <a href="docs/en_US/TrainingService/AdaptDLMode.md">AdaptDL (aka. ADL)</a> and other cloud options.

## **Who should consider using NNI**

Expand Down Expand Up @@ -173,11 +173,13 @@ Within the following table, we summarized the current NNI capabilities, we are g
<li><a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a></li>
<li><a href="docs/en_US/TrainingService/AMLMode.md">AML(Azure Machine Learning)</a></li>
<li><b>Kubernetes based services</b></li>
<ul><li><a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a></li>
<li><a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
</ul>
<ul><li><a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a></li>
<ul>
<li><a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a></li>
<li><a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
<li><a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a></li>
<li><a href="docs/en_US/TrainingService/AdaptDLMode.md">AdaptDL (aka. ADL)</a></li>
</ul>
</ul>
</td>
</tr>
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Use Grid search to find the best combination of alpha, beta and gamma for Effici
[Example code](https://github.com/microsoft/nni/tree/v1.9/examples/trials/efficientnet)

1. Set your working directory here in the example code directory.
2. Run `git clone https://github.com/ultmaster/EfficientNet-PyTorch` to clone this modified version of [EfficientNet-PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch). The modifications were done to adhere to the original [Tensorflow version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) as close as possible (including EMA, label smoothing and etc.); also added are the part which gets parameters from tuner and reports intermediate/final results. Clone it into `EfficientNet-PyTorch`; the files like `main.py`, `train_imagenet.sh` will appear inside, as specified in the configuration files.
2. Run `git clone https://github.com/ultmaster/EfficientNet-PyTorch` to clone the [ultmaster modified version](https://github.com/ultmaster/EfficientNet-PyTorch) of the original [EfficientNet-PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch). The modifications were done to adhere to the original [Tensorflow version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) as close as possible (including EMA, label smoothing and etc.); also added are the part which gets parameters from tuner and reports intermediate/final results. Clone it into `EfficientNet-PyTorch`; the files like `main.py`, `train_imagenet.sh` will appear inside, as specified in the configuration files.
3. Run `nnictl create --config config_local.yml` (use `config_pai.yml` for OpenPAI) to find the best EfficientNet-B1. Adjust the training service (PAI/local/remote), batch size in the config files according to the environment.

For training on ImageNet, read `EfficientNet-PyTorch/train_imagenet.sh`. Download ImageNet beforehand and extract it adhering to [PyTorch format](https://pytorch.org/docs/stable/torchvision/datasets.html#imagenet) and then replace `/mnt/data/imagenet` in with the location of the ImageNet storage. This file should also be a good example to follow for mounting ImageNet into the container on OpenPAI.
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
101 changes: 101 additions & 0 deletions docs/en_US/Assessor/BuiltinAssessor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
.. role:: raw-html(raw)
:format: html


Built-in Assessors
==================

NNI provides state-of-the-art tuning algorithms within our builtin-assessors and makes them easy to use. Below is a brief overview of NNI's current builtin Assessors.

Note: Click the **Assessor's name** to get each Assessor's installation requirements, suggested usage scenario, and a config example. A link to a detailed description of each algorithm is provided at the end of the suggested scenario for each Assessor.

Currently, we support the following Assessors:

.. list-table::
:header-rows: 1
:widths: auto

* - Assessor
- Brief Introduction of Algorithm
* - `Medianstop <#MedianStop>`__
- Medianstop is a simple early stopping rule. It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S. `Reference Paper <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf>`__
* - `Curvefitting <#Curvefitting>`__
- Curve Fitting Assessor is an LPA (learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of the final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve. `Reference Paper <http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf>`__


Usage of Builtin Assessors
--------------------------

Usage of builtin assessors provided by the NNI SDK requires one to declare the **builtinAssessorName** and **classArgs** in the ``config.yml`` file. In this part, we will introduce the details of usage and the suggested scenarios, classArg requirements, and an example for each assessor.

Note: Please follow the provided format when writing your ``config.yml`` file.

:raw-html:`<a name="MedianStop"></a>`

Median Stop Assessor
^^^^^^^^^^^^^^^^^^^^

..
Builtin Assessor Name: **Medianstop**


**Suggested scenario**

It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. `Detailed Description <./MedianstopAssessor.rst>`__

**classArgs requirements:**


* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
* **start_step** (*int, optional, default = 0*\ ) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.

**Usage example:**

.. code-block:: yaml
# config.yml
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
start_step: 5
:raw-html:`<br>`

:raw-html:`<a name="Curvefitting"></a>`

Curve Fitting Assessor
^^^^^^^^^^^^^^^^^^^^^^

..
Builtin Assessor Name: **Curvefitting**


**Suggested scenario**

It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance. `Detailed Description <./CurvefittingAssessor.rst>`__

**Note**\ , according to the original paper, only incremental functions are supported. Therefore this assessor can only be used to maximize optimization metrics. For example, it can be used for accuracy, but not for loss.

**classArgs requirements:**


* **epoch_num** (*int,** required***\ ) - The total number of epochs. We need to know the number of epochs to determine which points we need to predict.
* **start_step** (*int, optional, default = 6*\ ) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.
* **threshold** (*float, optional, default = 0.95*\ ) - The threshold that we use to decide to early stop the worst performance curve. For example: if threshold = 0.95, and the best performance in the history is 0.9, then we will stop the trial who's predicted value is lower than 0.95 * 0.9 = 0.855.
* **gap** (*int, optional, default = 1*\ ) - The gap interval between Assessor judgements. For example: if gap = 2, start_step = 6, then we will assess the result when we get 6, 8, 10, 12...intermediate results.

**Usage example:**

.. code-block:: yaml
# config.yml
assessor:
builtinAssessorName: Curvefitting
classArgs:
epoch_num: 20
start_step: 6
threshold: 0.95
gap: 1
101 changes: 101 additions & 0 deletions docs/en_US/Assessor/CurvefittingAssessor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
Curve Fitting Assessor on NNI
=============================

Introduction
------------

The Curve Fitting Assessor is an LPA (learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of the final epoch's performance is worse than the best final performance in the trial history.

In this algorithm, we use 12 curves to fit the learning curve. The set of parametric curve models are chosen from this `reference paper <http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf>`__. The learning curves' shape coincides with our prior knowledge about the form of learning curves: They are typically increasing, saturating functions.


.. image:: ../../img/curvefitting_learning_curve.PNG
:target: ../../img/curvefitting_learning_curve.PNG
:alt: learning_curve


We combine all learning curve models into a single, more powerful model. This combined model is given by a weighted linear combination:


.. image:: ../../img/curvefitting_f_comb.gif
:target: ../../img/curvefitting_f_comb.gif
:alt: f_comb


with the new combined parameter vector


.. image:: ../../img/curvefitting_expression_xi.gif
:target: ../../img/curvefitting_expression_xi.gif
:alt: expression_xi


Assuming additive Gaussian noise and the noise parameter being initialized to its maximum likelihood estimate.

We determine the maximum probability value of the new combined parameter vector by learning the historical data. We use such a value to predict future trial performance and stop the inadequate experiments to save computing resources.

Concretely, this algorithm goes through three stages of learning, predicting, and assessing.


*
Step1: Learning. We will learn about the trial history of the current trial and determine the \xi at the Bayesian angle. First of all, We fit each curve using the least-squares method, implemented by ``fit_theta``. After we obtained the parameters, we filter the curve and remove the outliers, implemented by ``filter_curve``. Finally, we use the MCMC sampling method. implemented by ``mcmc_sampling``\ , to adjust the weight of each curve. Up to now, we have determined all the parameters in \xi.

*
Step2: Predicting. It calculates the expected final result accuracy, implemented by ``f_comb``\ , at the target position (i.e., the total number of epochs) by \xi and the formula of the combined model.

*
Step3: If the fitting result doesn't converge, the predicted value will be ``None``. In this case, we return ``AssessResult.Good`` to ask for future accuracy information and predict again. Furthermore, we will get a positive value from the ``predict()`` function. If this value is strictly greater than the best final performance in history * ``THRESHOLD``\ (default value = 0.95), return ``AssessResult.Good``\ , otherwise, return ``AssessResult.Bad``

The figure below is the result of our algorithm on MNIST trial history data, where the green point represents the data obtained by Assessor, the blue point represents the future but unknown data, and the red line is the Curve predicted by the Curve fitting assessor.


.. image:: ../../img/curvefitting_example.PNG
:target: ../../img/curvefitting_example.PNG
:alt: examples


Usage
-----

To use Curve Fitting Assessor, you should add the following spec in your experiment's YAML config file:

.. code-block:: yaml
assessor:
builtinAssessorName: Curvefitting
classArgs:
# (required)The total number of epoch.
# We need to know the number of epoch to determine which point we need to predict.
epoch_num: 20
# (optional) In order to save our computing resource, we start to predict when we have more than only after receiving start_step number of reported intermediate results.
# The default value of start_step is 6.
start_step: 6
# (optional) The threshold that we decide to early stop the worse performance curve.
# For example: if threshold = 0.95, best performance in the history is 0.9, then we will stop the trial which predict value is lower than 0.95 * 0.9 = 0.855.
# The default value of threshold is 0.95.
threshold: 0.95
# (optional) The gap interval between Assesor judgements.
# For example: if gap = 2, start_step = 6, then we will assess the result when we get 6, 8, 10, 12...intermedian result.
# The default value of gap is 1.
gap: 1
Limitation
----------

According to the original paper, only incremental functions are supported. Therefore this assessor can only be used to maximize optimization metrics. For example, it can be used for accuracy, but not for loss.

File Structure
--------------

The assessor has a lot of different files, functions, and classes. Here we briefly describe a few of them.


* ``curvefunctions.py`` includes all the function expressions and default parameters.
* ``modelfactory.py`` includes learning and predicting; the corresponding calculation part is also implemented here.
* ``curvefitting_assessor.py`` is the assessor which receives the trial history and assess whether to early stop the trial.

TODO
----


* Further improve the accuracy of the prediction and test it on more models.
67 changes: 67 additions & 0 deletions docs/en_US/Assessor/CustomizeAssessor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
Customize Assessor
==================

NNI supports to build an assessor by yourself for tuning demand.

If you want to implement a customized Assessor, there are three things to do:


#. Inherit the base Assessor class
#. Implement assess_trial function
#. Configure your customized Assessor in experiment YAML config file

**1. Inherit the base Assessor class**

.. code-block:: python
from nni.assessor import Assessor
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
**2. Implement assess trial function**

.. code-block:: python
from nni.assessor import Assessor, AssessResult
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
def assess_trial(self, trial_history):
"""
Determines whether a trial should be killed. Must override.
trial_history: a list of intermediate result objects.
Returns AssessResult.Good or AssessResult.Bad.
"""
# you code implement here.
...
**3. Configure your customized Assessor in experiment YAML config file**

NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the __init__ constructor.

.. code-block:: yaml
assessor:
codeDir: /home/abc/myassessor
classFileName: my_customized_assessor.py
className: CustomizedAssessor
# Any parameter need to pass to your Assessor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
Please noted in **2**. The object ``trial_history`` are exact the object that Trial send to Assessor by using SDK ``report_intermediate_result`` function.

The working directory of your assessor is ``<home>/nni-experiments/<experiment_id>/log``\ , which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``\ ,

More detail example you could see:

..
* :githublink:`medianstop-assessor <src/sdk/pynni/nni/medianstop_assessor>`
* :githublink:`curvefitting-assessor <src/sdk/pynni/nni/curvefitting_assessor>`

7 changes: 7 additions & 0 deletions docs/en_US/Assessor/MedianstopAssessor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Medianstop Assessor on NNI
==========================

Median Stop
-----------

Medianstop is a simple early stopping rule mentioned in this `paper <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf>`__. It stops a pending trial X after step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S.
Loading

0 comments on commit cb9efcc

Please sign in to comment.