Skip to content

Commit

Permalink
Merge pull request #235 from microsoft/master
Browse files Browse the repository at this point in the history
merge master
  • Loading branch information
SparkSnail authored Mar 17, 2020
2 parents 1d74ae5 + 2e42d1d commit 75028bd
Show file tree
Hide file tree
Showing 94 changed files with 14,581 additions and 970 deletions.
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ build:
cp -rf src/nni_manager/config src/nni_manager/dist/
#$(_INFO) Building WebUI $(_END)
cd src/webui && $(NNI_YARN) && $(NNI_YARN) build
#$(_INFO) Building NAS UI $(_END)
cd src/nasui && $(NNI_YARN) && $(NNI_YARN) build

# All-in-one target for non-expert users
# Installs NNI as well as its dependencies, and update bashrc to set PATH
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ The tool manages automated machine learning (AutoML) experiments, **dispatches a

* Those who want to **try different AutoML algorithms** in their training code/model.
* Those who want to run AutoML trial jobs **in different environments** to speed up search.
* Researchers and data scientists who want to easily **implement and experiement new AutoML algorithms**, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
* Researchers and data scientists who want to easily **implement and experiment new AutoML algorithms**, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
* ML Platform owners who want to **support AutoML in their platform**.

### **NNI v1.4 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**

## **NNI capabilities in a glance**
NNI provides CommandLine Tool as well as an user friendly WebUI to manage training experiements. With the extensible API, you can customize your own AutoML algorithms and training services. To make it easy for new users, NNI also provides a set of build-in stat-of-the-art AutoML algorithms and out of box support for popular training platforms.
NNI provides CommandLine Tool as well as an user friendly WebUI to manage training experiments. With the extensible API, you can customize your own AutoML algorithms and training services. To make it easy for new users, NNI also provides a set of build-in stat-of-the-art AutoML algorithms and out of box support for popular training platforms.

Within the following table, we summarized the current NNI capabilities, we are gradually adding new capabilities and we'd love to have your contribution.

Expand Down
4 changes: 2 additions & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ jobs:
- job: 'basic_test_pr_macOS'
pool:
vmImage: 'macOS 10.13'
vmImage: 'macOS-10.15'
strategy:
matrix:
Python36:
Expand All @@ -94,8 +94,8 @@ jobs:
python3 -m pip install torch==1.2.0 --user
python3 -m pip install torchvision==0.4.0 --user
python3 -m pip install tensorflow==1.13.1 --user
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
brew install swig@3
rm /usr/local/bin/swig
ln -s /usr/local/opt/swig\@3/bin/swig /usr/local/bin/swig
nnictl package install --name=SMAC
displayName: 'Install dependencies'
Expand Down
2 changes: 1 addition & 1 deletion deployment/deployment-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
export IMG_NAME=$(dev_docker_img)
export IMG_TAG=`git describe --tags --abbrev=0`.`date -u +%y%m%d%H%M`
echo 'updating docker file for testpyi...'
sed -ie 's/RUN python3 -m pip --no-cache-dir install nni/RUN python3 -m pip install --user --no-cache-dir --index-url https:\/\/test.pypi.org\/simple --extra-index-url https:\/\/pypi.org\/simple nni/' Dockerfile
sed -ie 's/RUN python3 -m pip --no-cache-dir install nni/RUN python3 -m pip install --no-cache-dir --index-url https:\/\/test.pypi.org\/simple --extra-index-url https:\/\/pypi.org\/simple nni/' Dockerfile
else
docker login -u $(docker_hub_user) -p $(docker_hub_pwd)
export IMG_NAME=msranni/nni
Expand Down
2 changes: 2 additions & 0 deletions docs/en_US/AdvancedFeature/MultiPhase.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Multi-phase

## What is multi-phase experiment

Typically each trial job gets a single configuration (e.g., hyperparameters) from tuner, tries this configuration and reports result, then exits. But sometimes a trial job may wants to request multiple configurations from tuner. We find this is a very compelling feature. For example:
Expand Down
104 changes: 104 additions & 0 deletions docs/en_US/Compressor/Framework.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Design Doc

## Overview
The model compression framework has two main components: `pruner` and `module wrapper`.

### pruner
A `pruner` is responsible for :
1. provide a `cal_mask` method that calculates masks for weight and bias.
2. replace the module with `module wrapper` based on config.
3. modify the optimizer so that the `cal_mask` method is called every time the `step` method is called.

### module wrapper
A `module wrapper` is a module containing :
1. the origin module
2. some buffers used by `cal_mask`
3. a new forward method that applies masks before running the original forward method.

the reasons to use `module wrapper` :
1. some buffers are needed by `cal_mask` to calculate masks and these buffers should be registered in `module wrapper` so that the original modules are not contaminated.
2. a new `forward` method is needed to apply masks to weight before calling the real `forward` method.

## How it works
A basic pruner usage:
```python
configure_list = [{
'sparsity': 0.7,
'op_types': ['BatchNorm2d'],
}]

optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = SlimPruner(model, configure_list, optimizer)
model = pruner.compress()
```

A pruner receive model, config and optimizer as arguments. In the `__init__` method, the `step` method of the optimizer is replaced with a new `step` method that calls `cal_mask`. Also, all modules are checked if they need to be pruned based on config. If a module needs to be pruned, then this module is replaced by a `module wrapper`. Afterward, the new model and new optimizer are returned, which can be trained as before. `compress` method will calculate the default masks.

## Implement a new pruning algorithm
Implementing a new pruning algorithm requires implementing a new `pruner` class, which should subclass `Pruner` and override the `cal_mask` method. The `cal_mask` is called by`optimizer.step` method.
The `Pruner` base class provided basic functionality listed above, for example, replacing modules and patching optimizer.

A basic pruner look likes this:
```python
class NewPruner(Pruner):
def __init__(self, model, config_list, optimizer)
super().__init__(model, config_list, optimizer)
# do some initialization

def calc_mask(self, wrapper, **kwargs):
# do something to calculate weight_mask
wrapper.weight_mask = weight_mask
```
### Set wrapper attribute
Sometimes `cal_mask` must save some state data, therefore users can use `set_wrappers_attribute` API to register attribute just like how buffers are registered in PyTorch modules. These buffers will be registered to `module wrapper`. Users can access these buffers through `module wrapper`.

```python
class NewPruner(Pruner):
def __init__(self, model, config_list, optimizer):
super().__init__(model, config_list, optimizer)
self.set_wrappers_attribute("if_calculated", False)

def calc_mask(self, wrapper):
# do something to calculate weight_mask
if wrapper.if_calculated:
pass
else:
wrapper.if_calculated = True
# update masks
```

### Collect data during forward
Sometimes users want to collect some data during the modules' forward method, for example, the mean value of the activation. Therefore user can add a customized collector to module.

```python
class ActivationRankFilterPruner(Pruner):
def __init__(self, model, config_list, optimizer, activation='relu', statistics_batch_num=1):
super().__init__(model, config_list, optimizer)
self.set_wrappers_attribute("if_calculated", False)
self.set_wrappers_attribute("collected_activation", [])
self.statistics_batch_num = statistics_batch_num

def collector(module_, input_, output):
if len(module_.collected_activation) < self.statistics_batch_num:
module_.collected_activation.append(self.activation(output.detach().cpu()))
self.add_activation_collector(collector)
assert activation in ['relu', 'relu6']
if activation == 'relu':
self.activation = torch.nn.functional.relu
elif activation == 'relu6':
self.activation = torch.nn.functional.relu6
else:
self.activation = None
```
The collector function will be called each time the forward method runs.

Users can also remove this collector like this:
```python
collector_id = self.add_activation_collector(collector)
# ...
self.remove_activation_collector(collector_id)
```

### Multi-GPU support
On multi-GPU training, buffers and parameters are copied to multiple GPU every time the `forward` method runs on multiple GPU. If buffers and parameters are updated in the `forward` method, an `in-place` update is needed to ensure the update is effective.
Since `cal_mask` is called in the `optimizer.step` method, which happens after the `forward` method and happens only on one GPU, it supports multi-GPU naturally.
2 changes: 1 addition & 1 deletion docs/en_US/Compressor/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ As larger neural networks with more layers and nodes are considered, reducing th

We are glad to introduce model compression toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute.

NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It currently supports PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It currently supports PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms). Details about how model compression framework works can be found in [here](./Framework.md).

For a survey of model compression, you can refer to this paper: [Recent Advances in Efficient Computation of Deep Convolutional Neural Networks](https://arxiv.org/pdf/1802.00939.pdf).

Expand Down
4 changes: 3 additions & 1 deletion docs/en_US/Compressor/QuickStart.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quick Start to Compress a Model

NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use slim pruner as an example to show the usage. The complete code of this example can be found [here](https://github.com/microsoft/nni/blob/master/examples/model_compress/slim_torch_cifar10.py).
NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use slim pruner as an example to show the usage.

## Write configuration

Expand Down Expand Up @@ -34,6 +34,8 @@ After training, you get accuracy of the pruned model. You can export model weigh
pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
```

The complete code of model compression examples can be found [here](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py).

## Speed up the model

Masks do not provide real speedup of your model. The model should be speeded up based on the exported masks, thus, we provide an API to speed up your model as shown below. After invoking `apply_compression_results` on your model, your model becomes a smaller one with shorter inference latency.
Expand Down
2 changes: 1 addition & 1 deletion docs/en_US/NAS/Advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ To demonstrate what mutators are for, we need to know how one-shot NAS normally

Finally, mutators provide a method called `mutator.export()` that export a dict with architectures to the model. Note that currently this dict this a mapping from keys of mutables to tensors of selection. So in order to dump to json, users need to convert the tensors explicitly into python list.

Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See [Trainers](./NasReference.md#trainers) for details.
Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See [Trainers](./NasReference.md) for details.

## Implement New Mutators

Expand Down
4 changes: 2 additions & 2 deletions docs/en_US/NAS/NasGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Input choice can be thought of as a callable module that receives a list of tens

`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation type once defined, models with mutables are essentially a series of possible models.

Users can specify a **key** for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md#mutables).
Users can specify a **key** for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md).

## Use a Search Algorithm

Expand Down Expand Up @@ -163,7 +163,7 @@ The JSON is simply a mapping from mutable keys to one-hot or multi-hot represent
}
```

After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. This comes with pros and cons. The good side is, you can directly load the checkpoint dumped from supernet during search phase and start retrain from there. However, this is also a model with redundant parameters, which may cause problems when trying to count the number of parameters in model. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain).
After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. This comes with pros and cons. The good side is, you can directly load the checkpoint dumped from supernet during search phase and start retrain from there. However, this is also a model with redundant parameters, which may cause problems when trying to count the number of parameters in model. For deeper reasons and possible workaround, see [Trainers](./NasReference.md).

Also refer to [DARTS](./DARTS.md) for example code of retraining.

Expand Down
8 changes: 4 additions & 4 deletions docs/en_US/NAS/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ Here are some common dependencies to run the examples. PyTorch needs to be above

|Name|Brief Introduction of Algorithm|
|---|---|
| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
| [SPOS's 2nd stage](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. _Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint._|

```eval_rst
.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint.
```eval_rst
.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint.
```

## Use NNI API
Expand All @@ -58,4 +58,4 @@ The programming interface of designing and searching a model is often demanded i
[5]: https://arxiv.org/abs/1703.01041

* To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub.
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub.
49 changes: 49 additions & 0 deletions docs/en_US/TrainingService/DLTSMode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
**Run an Experiment on DLTS**
===
NNI supports running an experiment on [DLTS](https://github.com/microsoft/DLWorkspace.git), called dlts mode. Before starting to use NNI dlts mode, you should have an account to access DLTS dashboard.

## Setup Environment

Step 1. Choose a cluster from DLTS dashboard, ask administrator for the cluster dashboard URL.

![Choose Cluster](../../img/dlts-step1.png)

Step 2. Prepare a NNI config YAML like the following:

```yaml
# Set this field to "dlts"
trainingServicePlatform: dlts
authorName: your_name
experimentName: auto_mnist
trialConcurrency: 2
maxExecDuration: 3h
maxTrialNum: 100
searchSpacePath: search_space.json
useAnnotation: false
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 1
image: msranni/nni
# Configuration to access DLTS
dltsConfig:
dashboard: # Ask administrator for the cluster dashboard URL
```
Remember to fill the cluster dashboard URL to the last line.
Step 3. Open your working directory of the cluster, paste the NNI config as well as related code to a directory.
![Copy Config](../../img/dlts-step3.png)
Step 4. Submit a NNI manager job to the specified cluster.
![Submit Job](../../img/dlts-step4.png)
Step 5. Go to Endpoints tab of the newly created job, click the Port 40000 link to check trial's information.
![View NNI WebUI](../../img/dlts-step5.png)
2 changes: 2 additions & 0 deletions docs/en_US/hpo_advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ Advanced Features
=================

.. toctree::
:maxdepth: 2

Enable Multi-phase <AdvancedFeature/MultiPhase>
Write a New Tuner <Tuner/CustomizeTuner>
Write a New Assessor <Assessor/CustomizeAssessor>
Expand Down
1 change: 1 addition & 0 deletions docs/en_US/model_compression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ For details, please refer to the following tutorials:
Quantizers <quantizers>
Model Speedup <Compressor/ModelSpeedup>
Automatic Model Compression <Compressor/AutoCompression>
Implementation <Compressor/Framework>
1 change: 1 addition & 0 deletions docs/en_US/training_services.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ Introduction to NNI Training Services
OpenPAI Yarn Mode<./TrainingService/PaiYarnMode>
Kubeflow<./TrainingService/KubeflowMode>
FrameworkController<./TrainingService/FrameworkControllerMode>
OpenPAI<./TrainingService/DLTSMode>
Binary file added docs/img/dlts-step1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/dlts-step3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/dlts-step4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/dlts-step5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 75028bd

Please sign in to comment.