Merge 0.3 into master (#313)

* Quick fix nnictl config logic (#289) * fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port * add exception handling in trial_keeper.py * fix port bug * fix resume * fix nnictl resume and fix nnictl stop * fix document * update * refactor nnictl * update * update doc * update * update nnictl * fix comment * revert dockerfile * update * update * update * fix nnictl error hit * fix comments * fix bash-completion * fix paramiko install * quick fix resume logic * update * quick fix nnictl * PR merge to 0.3 (#297) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * update doc * fix links * fix links * fix links * merge * fix links and doc errors * merge * merge * merge * merge * Update README.md (#288) added License badge * merge * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * fix doc mistakes and broken links. (#271) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * Update README.md * Fix misspelling in examples/trials/ga_squad/README.md * revise the installation cmd to v0.2 * revise to install v0.2 * remove enas readme (#292) * Fix datastore performance issue (#301) * Fix nnictl in v0.3 (#299) Fix old version of config file fix sklearn requirements Fix resume log logic * remove paramiko in V0.3 (#306) remove paramiko in V0.3 * Release note 0.3 (#303) * v0.3 release notes * updates * updates * updates * updates * updates * updates * Inform users to set experiment id when id is empty (#310) * fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port * add exception handling in trial_keeper.py * fix port bug * fix resume * fix nnictl resume and fix nnictl stop * fix document * update * refactor nnictl * update * update doc * update * update nnictl * fix comment * revert dockerfile * update * update * update * fix nnictl error hit * fix comments * fix bash-completion * fix paramiko install * quick fix resume logic * update * quick fix nnictl * fix nnictl crash bug * add requirement.txt for sklearn example * fix nnictl configuration bug * update * update * update * update * remove paramiko * refactor nnictl lfor log stdout * update * updaate * fix endtime when resume (#307) * fix endtime when resume * update * update * update * updates
microsoft · Nov 2, 2018 · 06710ab · 06710ab
1 parent bbf4760
commit 06710ab
Showing 8 changed files with 84 additions and 23 deletions.
diff --git a/docs/RELEASE.md b/docs/RELEASE.md
@@ -1,3 +1,37 @@
+# Release 0.3.0 - 11/2/2018
+## Major Features
+* Support running multiple experiments simultaneously. You can run multiple experiments by specifying a unique port for each experiment:
+
+    ```nnictl create --port 8081 --config <config file path>```
+
+    You can still run the first experiment without '--port' parameter:
+
+    ```nnictl create --config <config file path>```
+* A builtin Batch Tuner which iterates all parameter combination, can be used to submit batch trial jobs.
+* nni.report_final_result(result) API supports more data types for result parameter, it can be of following types:
+    * int
+    * float
+    * A python dict containing 'default' key, the value of 'default' key should be of type int or float. The dict can contain any other key value pairs.
+* Continuous Integration
+    * Switched to Azure pipelines
+* Others
+    * New nni.get_sequence_id() API. Each trial job is allocated a unique sequence number, which can be retrieved by nni.get_sequence_id() API.
+    * Download experiment result from WebUI
+    * Add trial examples using sklearn and NNI together
+    * Support updating max trial number
+    * Kaggle competition TGS Salt code as an example
+    * NNI Docker image:
+
+      ```docker pull msranni/nni:latest```
+
+## Breaking changes
+*   <span style="color:red">API nn.get_parameters() is renamed to nni.get_next_parameter(), this is a broken change, all examples of prior releases can not run on v0.3, please clone nni repo to get new examples.</span>
+
+    ```git clone -b v0.3 https://github.com/Microsoft/nni.git```
+
+## Know issues
+[Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues).
+
 # Release 0.2.0 - 9/29/2018
 ## Major Features
    * Support [OpenPAI](https://github.com/Microsoft/pai) (aka pai) Training Service (See [here](./PAIMode.md) for instructions about how to submit NNI job in pai mode)

diff --git a/examples/trials/enas/README.md b/examples/trials/enas/README.md
diff --git a/examples/trials/sklearn/requirements.txt b/examples/trials/sklearn/requirements.txt
@@ -1,4 +1,4 @@
 python3 -m pip install numpy
 sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran
 sudo python3 -m pip install scipy
-sudo python3 -m pip install sklearn
+sudo python3 -m pip install sklearn
diff --git a/src/nni_manager/common/datastore.ts b/src/nni_manager/common/datastore.ts
@@ -66,7 +66,7 @@ interface TrialJobInfo {
     endTime?: number;
     hyperParameters?: string[];
     logPath?: string;
-    finalMetricData?: string;
+    finalMetricData?: MetricDataRecord;
     stderrPath?: string;
 }
 

diff --git a/src/nni_manager/core/nniDataStore.ts b/src/nni_manager/core/nniDataStore.ts
@@ -156,21 +156,23 @@ class NNIDataStore implements DataStore {
     }
 
     private async queryTrialJobs(status?: TrialJobStatus, trialJobId?: string): Promise<TrialJobInfo[]> {
-        const result: TrialJobInfo[]= [];
+        const result: TrialJobInfo[] = [];
         const trialJobEvents: TrialJobEventRecord[] = await this.db.queryTrialJobEvent(trialJobId);
         if (trialJobEvents === undefined) {
             return result;
         }
         const map: Map<string, TrialJobInfo> = this.getTrialJobsByReplayEvents(trialJobEvents);
 
-        for (let key of map.keys()) {
-            const jobInfo = map.get(key);
+        const finalMetricsMap: Map<string, MetricDataRecord> = await this.getFinalMetricData(trialJobId);
+
+        for (const key of map.keys()) {
+            const jobInfo: TrialJobInfo | undefined = map.get(key);
             if (jobInfo === undefined) {
                 continue;
             }
             if (!(status !== undefined && jobInfo.status !== status)) {
                 if (jobInfo.status === 'SUCCEEDED') {
-                    jobInfo.finalMetricData = await this.getFinalMetricData(jobInfo.id);
+                    jobInfo.finalMetricData = finalMetricsMap.get(jobInfo.id);
                 }
                 result.push(jobInfo);
             }
@@ -179,16 +181,20 @@ class NNIDataStore implements DataStore {
         return result;
     }
 
-    private async getFinalMetricData(trialJobId: string): Promise<any> {
+    private async getFinalMetricData(trialJobId?: string): Promise<Map<string, MetricDataRecord>> {
+        const map: Map<string, MetricDataRecord> = new Map();
         const metrics: MetricDataRecord[] = await this.getMetricData(trialJobId, 'FINAL');
 
         const multiPhase: boolean = await this.isMultiPhase();
 
-        if (metrics.length > 1 && !multiPhase) {
-            this.log.error(`Found multiple FINAL results for trial job ${trialJobId}`);
+        for (const metric of metrics) {
+            if (map.has(metric.trialJobId) && !multiPhase) {
+                this.log.error(`Found multiple FINAL results for trial job ${trialJobId}`);
+            }
+            map.set(metric.trialJobId, metric);
         }
 
-        return metrics[metrics.length - 1];
+        return map;
     }
 
     private async isMultiPhase(): Promise<boolean> {

diff --git a/src/nni_manager/core/nnimanager.ts b/src/nni_manager/core/nnimanager.ts
@@ -175,6 +175,11 @@ class NNIManager implements Manager {
             .filter((job: TrialJobInfo) => job.status === 'WAITING' || job.status === 'RUNNING')
             .map((job: TrialJobInfo) => this.dataStore.storeTrialJobEvent('FAILED', job.id)));
 
+        if (this.experimentProfile.execDuration < this.experimentProfile.params.maxExecDuration &&
+            this.currSubmittedTrialNum < this.experimentProfile.params.maxTrialNum &&
+            this.experimentProfile.endTime) {
+            delete this.experimentProfile.endTime;
+        }
         this.status.status = 'EXPERIMENT_RUNNING';
 
         // TO DO: update database record for resume event

diff --git a/src/nni_manager/rest_server/test/mockedNNIManager.ts b/src/nni_manager/rest_server/test/mockedNNIManager.ts
@@ -158,14 +158,28 @@ export class MockedNNIManager extends Manager {
             status: 'SUCCEEDED',
             startTime: Date.now(),
             endTime: Date.now(),
-            finalMetricData: 'lr: 0.01, val accuracy: 0.89, batch size: 256'
+            finalMetricData: {
+                timestamp: 0,
+                trialJobId: '3456',
+                parameterId: '123',
+                type: 'FINAL',
+                sequence: 0,
+                data: '0.2'
+            }
         };
         const job2: TrialJobInfo = {
             id: '3456',
             status: 'FAILED',
             startTime: Date.now(),
             endTime: Date.now(),
-            finalMetricData: ''
+            finalMetricData: {
+                timestamp: 0,
+                trialJobId: '3456',
+                parameterId: '123',
+                type: 'FINAL',
+                sequence: 0,
+                data: '0.2'
+            }
         };
 
         return Promise.resolve([job1, job2]);

diff --git a/tools/nnicmd/nnictl_utils.py b/tools/nnicmd/nnictl_utils.py
@@ -38,7 +38,7 @@ def check_experiment_id(args):
     experiment_dict = experiment_config.get_all_experiments()
     if not experiment_dict:
         print_normal('There is no experiment running...')
-        exit(1)
+        return None
     if not args.id:
         running_experiment_list = []
         for key in experiment_dict.keys():
@@ -58,14 +58,14 @@ def check_experiment_id(args):
             exit(1)
         elif not running_experiment_list:
             print_error('There is no experiment running!')
-            exit(1)
+            return None
         else:
             return running_experiment_list[0]
     if experiment_dict.get(args.id):
         return args.id
     else:
         print_error('Id not correct!')
-        exit(1)
+        return None
 
 def parse_ids(args):
     '''Parse the arguments for nnictl stop
@@ -116,20 +116,28 @@ def parse_ids(args):
         if len(result_list) > 1:
             print_error(args.id + ' is ambiguous, please choose ' + ' '.join(result_list) )
             return None
-    if not result_list:
-        print_error('There are no experiments matched, please check experiment id...')
+    if not result_list and args.id:
+        print_error('There are no experiments matched, please set correct experiment id...')
+    elif not result_list:
+        print_error('There is no experiment running...')
     return result_list
 
 def get_config_filename(args):
     '''get the file name of config file'''
     experiment_id = check_experiment_id(args)
+    if experiment_id is None:
+        print_error('Please set the experiment id!')
+        exit(1)
     experiment_config = Experiments()
     experiment_dict = experiment_config.get_all_experiments()
     return experiment_dict[experiment_id]['fileName']
 
 def get_experiment_port(args):
     '''get the port of experiment'''
     experiment_id = check_experiment_id(args)
+    if experiment_id is None:
+        print_error('Please set the experiment id!')
+        exit(1)
     experiment_config = Experiments()
     experiment_dict = experiment_config.get_all_experiments()
     return experiment_dict[experiment_id]['port']