Merge pull request #2081 from microsoft/v1.4

merge V1.4 back to master
microsoft · Feb 19, 2020 · 24fa461 · 24fa461
2 parents aaaa275 + 8ff039c
commit 24fa461
Show file tree

Hide file tree

Showing 40 changed files with 328 additions and 187 deletions.
diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@ The tool manages automated machine learning (AutoML) experiments, **dispatches a
 * Researchers and data scientists who want to easily **implement and experiement new AutoML algorithms**, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
 * ML Platform owners who want to **support AutoML in their platform**.
 
-### **NNI v1.3 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
+### **NNI v1.4 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
 
 ## **NNI capabilities in a glance**
 NNI provides CommandLine Tool as well as an user friendly WebUI to manage training experiements. With the extensible API, you can customize your own AutoML algorithms and training services. To make it easy for new users, NNI also provides a set of build-in stat-of-the-art AutoML algorithms and out of box support for popular training platforms. 
@@ -233,7 +233,7 @@ The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is
 * Download the examples via clone the source code.
 
   ```bash
-  git clone -b v1.3 https://github.com/Microsoft/nni.git
+  git clone -b v1.4 https://github.com/Microsoft/nni.git
   ```
 
 * Run the MNIST example.

diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md
@@ -60,4 +60,4 @@ ProxylessNasMutator also implements the forward logic of the mutables (i.e., Lay
 
 ## Reproduce Results
 
-Ongoing...
+To reproduce the result, we first run the search, we found that though it runs many epochs the chosen architecture converges at the first several epochs. This is probably induced by hyper-parameters or the implementation, we are working on it. The test accuracy of the found architecture is top1: 72.31, top5: 90.26.
diff --git a/docs/en_US/Release.md b/docs/en_US/Release.md
@@ -1,5 +1,45 @@
 # ChangeLog
 
+## Release 1.4 - 2/19/2020
+
+### Major Features
+
+#### Neural Architecture Search
+* Support [C-DARTS](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/NAS/CDARTS.md) algorithm and add [the example](https://github.com/microsoft/nni/tree/v1.4/examples/nas/cdarts) using it
+* Support a preliminary version of [ProxylessNAS](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/NAS/Proxylessnas.md) and the corresponding [example](https://github.com/microsoft/nni/tree/v1.4/examples/nas/proxylessnas)
+* Add unit tests for the NAS framework
+
+#### Model Compression
+* Support DataParallel for compressing models, and provide [an example](https://github.com/microsoft/nni/blob/v1.4/examples/model_compress/multi_gpu.py) of using DataParallel
+* Support [model speedup](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/Compressor/ModelSpeedup.md) for compressed models, in Alpha version
+
+#### Training Service
+* Support complete PAI configurations by allowing users to specify PAI config file path
+* Add example config yaml files for the new PAI mode (i.e., paiK8S)
+* Support deleting experiments using sshkey in remote mode (thanks external contributor @tyusr)
+
+#### WebUI
+* WebUI refactor: adopt fabric framework
+
+#### Others
+* Support running [NNI experiment at foreground](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/Tutorial/Nnictl.md#manage-an-experiment), i.e., `--foreground` argument in `nnictl create/resume/view`
+* Support canceling the trials in UNKNOWN state
+* Support large search space whose size could be up to 50mb (thanks external contributor @Sundrops)
+
+### Documentation
+* Improve [the index structure](https://nni.readthedocs.io/en/latest/) of NNI readthedocs
+* Improve [documentation for NAS](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/NAS/NasGuide.md)
+* Improve documentation for [the new PAI mode](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/TrainingService/PaiMode.md)
+* Add QuickStart guidance for [NAS](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/NAS/QuickStart.md) and [model compression](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/Compressor/QuickStart.md)
+* Improve documentation for [the supported EfficientNet](https://github.com/microsoft/nni/blob/v1.4/docs/en_US/TrialExample/EfficientNet.md)
+
+### Bug Fixes
+* Correctly support NaN in metric data, JSON compliant
+* Fix the out-of-range bug of `randint` type in search space
+* Fix the bug of wrong tensor device when exporting onnx model in model compression
+* Fix incorrect handling of nnimanagerIP in the new PAI mode (i.e., paiK8S)
+
+
 ## Release 1.3 - 12/30/2019
 
 ### Major Features

diff --git a/docs/en_US/Tutorial/InstallationLinux.md b/docs/en_US/Tutorial/InstallationLinux.md
@@ -19,7 +19,7 @@ Installation on Linux and macOS follow the same instruction below.
   Prerequisites: `python 64-bit >=3.5`, `git`, `wget`
 
   ```bash
-  git clone -b v1.3 https://github.com/Microsoft/nni.git
+  git clone -b v1.4 https://github.com/Microsoft/nni.git
   cd nni
   ./install.sh
   ```
@@ -35,7 +35,7 @@ The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is
 * Download the examples via clone the source code.
 
   ```bash
-  git clone -b v1.3 https://github.com/Microsoft/nni.git
+  git clone -b v1.4 https://github.com/Microsoft/nni.git
   ```
 
 * Run the MNIST example.

diff --git a/docs/en_US/Tutorial/InstallationWin.md b/docs/en_US/Tutorial/InstallationWin.md
@@ -19,7 +19,7 @@ Anaconda or Miniconda is highly recommended to manage multiple Python environmen
   Prerequisites: `python 64-bit >=3.5`, `git`, `PowerShell`.
 
   ```bash
-  git clone -b v1.3 https://github.com/Microsoft/nni.git
+  git clone -b v1.4 https://github.com/Microsoft/nni.git
   cd nni
   powershell -ExecutionPolicy Bypass -file install.ps1
   ```
@@ -31,7 +31,7 @@ The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is
 * Download the examples via clone the source code.
 
   ```bash
-  git clone -b v1.3 https://github.com/Microsoft/nni.git
+  git clone -b v1.4 https://github.com/Microsoft/nni.git
   ```
 
 * Run the MNIST example.

diff --git a/docs/en_US/conf.py b/docs/en_US/conf.py
@@ -28,7 +28,7 @@
 # The short X.Y version
 version = ''
 # The full version, including alpha/beta/rc tags
-release = 'v1.3'
+release = 'v1.4'
 
 # -- General configuration ---------------------------------------------------
 

diff --git a/examples/model_compress/main_torch_pruner.py b/examples/model_compress/main_torch_pruner.py
@@ -55,7 +55,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cpu')
+    device = torch.device('cuda')
 
     trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
     train_loader = torch.utils.data.DataLoader(
@@ -66,7 +66,7 @@ def main():
         batch_size=1000, shuffle=True)
 
     model = Mnist()
-    model.to(device)
+    model = model.to(device)
 
     '''you can change this to LevelPruner to implement it
     pruner = LevelPruner(configure_list)
@@ -82,14 +82,14 @@ def main():
 
     pruner = AGP_Pruner(model, configure_list)
     model = pruner.compress()
-
+    model = model.to(device)
     optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
     for epoch in range(10):
         pruner.update_epoch(epoch)
         print('# Epoch {} #'.format(epoch))
         train(model, device, train_loader, optimizer)
         test(model, device, test_loader)
-    pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28])
+    pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28], device)
 
 
 if __name__ == '__main__':

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
@@ -34,6 +34,7 @@
     # configurations for search
     parser.add_argument("--checkpoint_path", default='./search_mobile_net.pt', type=str)
     parser.add_argument("--arch_path", default='./arch_path.pt', type=str)
+    parser.add_argument("--no-warmup", dest='warmup', action='store_false')
     # configurations for retrain
     parser.add_argument("--exported_arch_path", default=None, type=str)
 
@@ -54,7 +55,7 @@
 
     # move network to GPU if available
     if torch.cuda.is_available():
-        device = torch.device('cuda:0')
+        device = torch.device('cuda')
     else:
         device = torch.device('cpu')
 
@@ -86,7 +87,7 @@
                                       train_loader=data_provider.train,
                                       valid_loader=data_provider.valid,
                                       device=device,
-                                      warmup=True,
+                                      warmup=args.warmup,
                                       ckpt_path=args.checkpoint_path,
                                       arch_path=args.arch_path)
 
@@ -102,4 +103,4 @@
             "exported_arch_path {} should be a file.".format(args.exported_arch_path)
         apply_fixed_architecture(model, args.exported_arch_path, device=device)
         trainer = Retrain(model, optimizer, device, data_provider, n_epochs=300)
-        trainer.run()
+        trainer.run()
diff --git a/src/nni_manager/core/nniDataStore.ts b/src/nni_manager/core/nniDataStore.ts
@@ -4,7 +4,6 @@
 'use strict';
 
 import * as assert from 'assert';
-import * as JSON5 from 'json5';
 import { Deferred } from 'ts-deferred';
 
 import * as component from '../common/component';
@@ -132,7 +131,7 @@ class NNIDataStore implements DataStore {
     }
 
     public async storeMetricData(trialJobId: string, data: string): Promise<void> {
-        const metrics: MetricData = JSON5.parse(data);
+        const metrics: MetricData = JSON.parse(data);
         // REQUEST_PARAMETER is used to request new parameters for multiphase trial job,
         // it is not metrics, so it is skipped here.
         if (metrics.type === 'REQUEST_PARAMETER') {
@@ -141,7 +140,7 @@ class NNIDataStore implements DataStore {
         }
         assert(trialJobId === metrics.trial_job_id);
         try {
-            await this.db.storeMetricData(trialJobId, JSON5.stringify({
+            await this.db.storeMetricData(trialJobId, JSON.stringify({
                 trialJobId: metrics.trial_job_id,
                 parameterId: metrics.parameter_id,
                 type: metrics.type,

diff --git a/src/nni_manager/core/sqlDatabase.ts b/src/nni_manager/core/sqlDatabase.ts
@@ -5,7 +5,6 @@
 
 import * as assert from 'assert';
 import * as fs from 'fs';
-import * as JSON5 from 'json5';
 import * as path from 'path';
 import * as sqlite3 from 'sqlite3';
 import { Deferred } from 'ts-deferred';
@@ -203,10 +202,10 @@ class SqlDB implements Database {
 
     public storeMetricData(trialJobId: string, data: string): Promise<void> {
         const sql: string = 'insert into MetricData values (?,?,?,?,?,?)';
-        const json: MetricDataRecord = JSON5.parse(data);
-        const args: any[] = [Date.now(), json.trialJobId, json.parameterId, json.type, json.sequence, JSON5.stringify(json.data)];
+        const json: MetricDataRecord = JSON.parse(data);
+        const args: any[] = [Date.now(), json.trialJobId, json.parameterId, json.type, json.sequence, JSON.stringify(json.data)];
 
-        this.log.trace(`storeMetricData: SQL: ${sql}, args: ${JSON5.stringify(args)}`);
+        this.log.trace(`storeMetricData: SQL: ${sql}, args: ${JSON.stringify(args)}`);
         const deferred: Deferred<void> = new Deferred<void>();
         this.db.run(sql, args, (err: Error | null) => { this.resolve(deferred, err); });
 

diff --git a/src/nni_manager/package.json b/src/nni_manager/package.json
@@ -17,7 +17,6 @@
     "express": "^4.16.3",
     "express-joi-validator": "^2.0.0",
     "js-base64": "^2.4.9",
-    "json5": "^2.1.1",
     "kubernetes-client": "^6.5.0",
     "rx": "^4.1.0",
     "sqlite3": "^4.0.2",
@@ -36,7 +35,6 @@
     "@types/express": "^4.16.0",
     "@types/glob": "^7.1.1",
     "@types/js-base64": "^2.3.1",
-    "@types/json5": "^0.0.30",
     "@types/mocha": "^5.2.5",
     "@types/node": "10.12.18",
     "@types/request": "^2.47.1",

diff --git a/src/nni_manager/rest_server/nniRestServer.ts b/src/nni_manager/rest_server/nniRestServer.ts
@@ -34,7 +34,7 @@ export class NNIRestServer extends RestServer {
      */
     protected registerRestHandler(): void {
         this.app.use(express.static('static'));
-        this.app.use(bodyParser.json());
+        this.app.use(bodyParser.json({limit: '50mb'}));
         this.app.use(this.API_ROOT_URL, createRestHandler(this));
         this.app.use(this.LOGS_ROOT_URL, express.static(getLogDir()));
         this.app.get('*', (req: express.Request, res: express.Response) => {

diff --git a/src/nni_manager/yarn.lock b/src/nni_manager/yarn.lock
@@ -157,10 +157,6 @@
   version "7.0.3"
   resolved "https://registry.yarnpkg.com/@types/json-schema/-/json-schema-7.0.3.tgz#bdfd69d61e464dcc81b25159c270d75a73c1a636"
 
-"@types/json5@^0.0.30":
-  version "0.0.30"
-  resolved "https://registry.yarnpkg.com/@types/json5/-/json5-0.0.30.tgz#44cb52f32a809734ca562e685c6473b5754a7818"
-
 "@types/mime@*":
   version "2.0.0"
   resolved "https://registry.yarnpkg.com/@types/mime/-/mime-2.0.0.tgz#5a7306e367c539b9f6543499de8dd519fac37a8b"
@@ -2380,12 +2376,6 @@ json-stringify-safe@~5.0.1:
   version "5.0.1"
   resolved "https://registry.yarnpkg.com/json-stringify-safe/-/json-stringify-safe-5.0.1.tgz#1296a2d58fd45f19a0f6ce01d65701e2c735b6eb"
 
-json5@^2.1.1:
-  version "2.1.1"
-  resolved "https://registry.yarnpkg.com/json5/-/json5-2.1.1.tgz#81b6cb04e9ba496f1c7005d07b4368a2638f90b6"
-  dependencies:
-    minimist "^1.2.0"
-
 jsonparse@^1.2.0:
   version "1.3.1"
   resolved "https://registry.yarnpkg.com/jsonparse/-/jsonparse-1.3.1.tgz#3f4dae4a91fac315f71062f8521cc239f1366280"

diff --git a/src/sdk/pynni/nni/bohb_advisor/bohb_advisor.py b/src/sdk/pynni/nni/bohb_advisor/bohb_advisor.py
@@ -557,7 +557,8 @@ def handle_report_metric_data(self, data):
             Data type not supported
         """
         logger.debug('handle report metric data = %s', data)
-
+        if 'value' in data:
+            data['value'] = json_tricks.loads(data['value'])
         if data['type'] == MetricType.REQUEST_PARAMETER:
             assert multi_phase_enabled()
             assert data['trial_job_id'] is not None
@@ -627,6 +628,8 @@ def handle_import_data(self, data):
         AssertionError
             data doesn't have required key 'parameter' and 'value'
         """
+        for entry in data:
+            entry['value'] = json_tricks.loads(entry['value'])
         _completed_num = 0
         for trial_info in data:
             logger.info("Importing data, current processing progress %s / %s", _completed_num, len(data))

diff --git a/src/sdk/pynni/nni/compression/speedup/__init__.py b/src/sdk/pynni/nni/compression/speedup/__init__.py
diff --git a/src/sdk/pynni/nni/compression/speedup/torch/compress_modules.py b/src/sdk/pynni/nni/compression/speedup/torch/compress_modules.py
@@ -1,13 +1,17 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
 
+import logging
 import torch
-from .infer_shape import CoarseMask, ModuleMasks
+from .infer_shape import ModuleMasks
+
+_logger = logging.getLogger(__name__)
 
 replace_module = {
     'BatchNorm2d': lambda module, mask: replace_batchnorm2d(module, mask),
     'Conv2d': lambda module, mask: replace_conv2d(module, mask),
     'MaxPool2d': lambda module, mask: no_replace(module, mask),
+    'AvgPool2d': lambda module, mask: no_replace(module, mask),
     'ReLU': lambda module, mask: no_replace(module, mask),
     'Linear': lambda module, mask: replace_linear(module, mask)
 }
@@ -16,6 +20,7 @@ def no_replace(module, mask):
     """
     No need to replace
     """
+    _logger.debug("no need to replace")
     return module
 
 def replace_linear(linear, mask):
@@ -37,9 +42,8 @@ def replace_linear(linear, mask):
     assert mask.output_mask is None
     assert not mask.param_masks
     index = mask.input_mask.mask_index[-1]
-    print(mask.input_mask.mask_index)
     in_features = index.size()[0]
-    print('linear: ', in_features)
+    _logger.debug("replace linear with new in_features: %d", in_features)
     new_linear = torch.nn.Linear(in_features=in_features,
                                  out_features=linear.out_features,
                                  bias=linear.bias is not None)
@@ -67,7 +71,7 @@ def replace_batchnorm2d(norm, mask):
     assert 'weight' in mask.param_masks and 'bias' in mask.param_masks
     index = mask.param_masks['weight'].mask_index[0]
     num_features = index.size()[0]
-    print("replace batchnorm2d: ", num_features, index)
+    _logger.debug("replace batchnorm2d with num_features: %d", num_features)
     new_norm = torch.nn.BatchNorm2d(num_features=num_features,
                                     eps=norm.eps,
                                     momentum=norm.momentum,
@@ -106,6 +110,7 @@ def replace_conv2d(conv, mask):
     else:
         out_channels_index = mask.output_mask.mask_index[1]
         out_channels = out_channels_index.size()[0]
+    _logger.debug("replace conv2d with in_channels: %d, out_channels: %d", in_channels, out_channels)
     new_conv = torch.nn.Conv2d(in_channels=in_channels,
                                out_channels=out_channels,
                                kernel_size=conv.kernel_size,
@@ -128,6 +133,5 @@ def replace_conv2d(conv, mask):
     assert tmp_weight_data is not None, "Conv2d weight should be updated based on masks"
     new_conv.weight.data.copy_(tmp_weight_data)
     if conv.bias is not None:
-        print('final conv.bias is not None')
         new_conv.bias.data.copy_(conv.bias.data if tmp_bias_data is None else tmp_bias_data)
     return new_conv
Original file line number	Diff line number	Diff line change
Expand Up		@@ -60,4 +60,4 @@ ProxylessNasMutator also implements the forward logic of the mutables (i.e., Lay

		## Reproduce Results

		Ongoing...
		To reproduce the result, we first run the search, we found that though it runs many epochs the chosen architecture converges at the first several epochs. This is probably induced by hyper-parameters or the implementation, we are working on it. The test accuracy of the found architecture is top1: 72.31, top5: 90.26.