Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Fix the issue that date nanoseconds does not work under macOS #1434

Merged
merged 12 commits into from
Aug 12, 2019
Merged
12 changes: 6 additions & 6 deletions docs/en_US/AdvancedFeature/GeneralNasInterfaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Here, `nni.training_update` is to do some update on the full graph. In enas_mode

**\*oneshot_mode\***: following the training approach in [this paper][6]. Different from enas_mode which trains the full graph by training large numbers of subgraphs, in oneshot_mode the full graph is built and dropout is added to candidate inputs and also added to candidate ops' outputs. Then this full graph is trained like other DL models. [Detailed Description](#OneshotMode). (currently only supported on tensorflow).

To use oneshot_mode, you should add one more field in the `trial` config as shown below. In this mode, no need to specify tuner in the config file as it does not need tuner. (Note that you still need to specify a tuner (any tuner) in the config file for now.) Also, no need to add `nni.training_update` in this mode, because no special processing (or update) is needed during training.
To use oneshot_mode, you should add one more field in the `trial` config as shown below. In this mode, though there is no need to use tuner, you still need to specify a tuner (any tuner) in the config file for now. Also, no need to add `nni.training_update` in this mode, because no special processing (or update) is needed during training.
```diff
trial:
command: your command to run the trial
Expand All @@ -132,7 +132,7 @@ trial:

**\*darts_mode\***: following the training approach in [this paper][3]. It is similar to oneshot_mode. There are two differences, one is that darts_mode only add architecture weights to the outputs of candidate ops, the other is that it trains model weights and architecture weights in an interleaved manner. [Detailed Description](#DartsMode).

To use darts_mode, you should add one more field in the `trial` config as shown below. In this mode, also no need to specify tuner in the config file as it does not need tuner. (Note that you still need to specify a tuner (any tuner) in the config file for now.)
To use darts_mode, you should add one more field in the `trial` config as shown below. In this mode, though there is no need to use tuner, you still need to specify a tuner (any tuner) in the config file for now.
```diff
trial:
command: your command to run the trial
Expand All @@ -156,9 +156,9 @@ for _ in range(num):

### enas_mode

In enas_mode, the compiled trial code builds the full graph (rather than subgraph), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](./multiPhase.md).
In enas_mode, the compiled trial code builds the full graph (rather than subgraph), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](./MultiPhase.md).

Specifically, for trials using tensorflow, we create and use tensorflow variable as signals, and tensorflow conditional functions to control the search space (full-graph) to be more flexible, which means it can be changed into different sub-graphs (multiple times) depending on these signals. [Here]() is an example for enas_mode.
Specifically, for trials using tensorflow, we create and use tensorflow variable as signals, and tensorflow conditional functions to control the search space (full-graph) to be more flexible, which means it can be changed into different sub-graphs (multiple times) depending on these signals. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas/enas_mode) is an example for enas_mode.

<a name="OneshotMode"></a>

Expand All @@ -168,7 +168,7 @@ Below is the figure to show where dropout is added to the full graph for one lay

![](../../img/oneshot_mode.png)

As suggested in the [paper][6], a dropout method is implemented to the inputs for every layer. The dropout rate is set to r^(1/k), where 0 < r < 1 is a hyper-parameter of the model (default to be 0.01) and k is number of optional inputs for a specific layer. The higher the fan-in, the more likely each possible input is to be dropped out. However, the probability of dropping out all optional_inputs of a layer is kept constant regardless of its fan-in. Suppose r = 0.05. If a layer has k = 2 optional_inputs then each one will independently be dropped out with probability 0.051/2 ≈ 0.22 and will be retained with probability 0.78. If a layer has k = 7 optional_inputs then each one will independently be dropped out with probability 0.051/7 ≈ 0.65 and will be retained with probability 0.35. In both cases, the probability of dropping out all of the layer's optional_inputs is 5%. The outputs of candidate ops are dropped out through the same way. [Here]() is an example for oneshot_mode.
As suggested in the [paper][6], a dropout method is implemented to the inputs for every layer. The dropout rate is set to r^(1/k), where 0 < r < 1 is a hyper-parameter of the model (default to be 0.01) and k is number of optional inputs for a specific layer. The higher the fan-in, the more likely each possible input is to be dropped out. However, the probability of dropping out all optional_inputs of a layer is kept constant regardless of its fan-in. Suppose r = 0.05. If a layer has k = 2 optional_inputs then each one will independently be dropped out with probability 0.051/2 ≈ 0.22 and will be retained with probability 0.78. If a layer has k = 7 optional_inputs then each one will independently be dropped out with probability 0.051/7 ≈ 0.65 and will be retained with probability 0.35. In both cases, the probability of dropping out all of the layer's optional_inputs is 5%. The outputs of candidate ops are dropped out through the same way. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas/oneshot_mode) is an example for oneshot_mode.

<a name="DartsMode"></a>

Expand All @@ -178,7 +178,7 @@ Below is the figure to show where architecture weights are added to the full gra

![](../../img/darts_mode.png)

In `nni.training_update`, tensorflow MomentumOptimizer is used to train the architecture weights based on the pass `loss` and `feed_dict`. [Here]() is an example for darts_mode.
In `nni.training_update`, tensorflow MomentumOptimizer is used to train the architecture weights based on the pass `loss` and `feed_dict`. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas/darts_mode) is an example for darts_mode.

### [__TODO__] Multiple trial jobs for One-Shot NAS

Expand Down
2 changes: 1 addition & 1 deletion docs/en_US/TrainingService/PaiMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Compared with [LocalMode](LocalMode.md) and [RemoteMachineMode](RemoteMachineMod
* shmMB
* Optional key. Set the shmMB configuration of OpenPAI, it set the shared memory for one task in the task role.
* authFile
* Optional key, Set the auth file path for private registry while using PAI mode, [Refer](https://github.com/microsoft/pai/blob/2ea69b45faa018662bc164ed7733f6fdbb4c42b3/docs/faq.md#q-how-to-use-private-docker-registry-job-image-when-submitting-an-openpai-job).
* Optional key, Set the auth file path for private registry while using PAI mode, [Refer](https://github.com/microsoft/pai/blob/2ea69b45faa018662bc164ed7733f6fdbb4c42b3/docs/faq.md#q-how-to-use-private-docker-registry-job-image-when-submitting-an-openpai-job), you can prepare the authFile and simply provide the local path of this file, NNI will upload this file to HDFS for you.

Once complete to fill NNI experiment config file and save (for example, save as exp_pai.yml), then run the following command
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en_US/Tutorial/WebUI.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Click the tab "Overview".
* If your experiment have many trials, you can change the refresh interval on here.

![](../../img/webui-img/refresh-interval.png)
* Support to review and download the experiment result and nni-manager/dispatcher log file from the download.
* Support to review and download the experiment result and nni-manager/dispatcher log file from the "View" button.

![](../../img/webui-img/download.png)
* You can click the learn about in the error box to track experiment log message if the experiment's status is error.
Expand Down
Binary file modified docs/img/webui-img/download.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/webui-img/over1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/webui-img/over2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/webui-img/refresh-interval.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 8 additions & 3 deletions src/nni_manager/training_service/local/localTrainingService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -521,9 +521,14 @@ class LocalTrainingService implements TrainingService {
`$NOW_DATE = "$NOW_DATE" + (Get-Date -Format fff).ToString()`,
`Write $LASTEXITCODE " " $NOW_DATE | Out-File ${path.join(workingDirectory, '.nni', 'state')} -NoNewline -encoding utf8`);
} else {
script.push(
`eval ${localTrialConfig.command} 2>${path.join(workingDirectory, 'stderr')}`,
`echo $? \`date +%s%3N\` >${path.join(workingDirectory, '.nni', 'state')}`);
script.push(`eval ${localTrialConfig.command} 2>${path.join(workingDirectory, 'stderr')}`);
if (process.platform === 'darwin') {
// https://superuser.com/questions/599072/how-to-get-bash-execution-time-in-milliseconds-under-mac-os-x
// Considering the worst case, write 999 to avoid negative duration
script.push(`echo $? \`date +%s999\` >${path.join(workingDirectory, '.nni', 'state')}`);
} else {
script.push(`echo $? \`date +%s%3N\` >${path.join(workingDirectory, '.nni', 'state')}`);
}
}

return script;
Expand Down
2 changes: 1 addition & 1 deletion src/nni_manager/training_service/pai/hdfsClientUtility.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ export namespace HDFSClientUtility {
* Get NNI experiment root directory
* @param hdfsUserName HDFS user name
*/
function hdfsExpRootDir(hdfsUserName: string): string {
export function hdfsExpRootDir(hdfsUserName: string): string {
// tslint:disable-next-line:prefer-template
return '/' + unixPathJoin(hdfsUserName, 'nni', 'experiments', getExperimentId());
}
Expand Down
14 changes: 13 additions & 1 deletion src/nni_manager/training_service/pai/paiTrainingService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,11 @@ class PAITrainingService implements TrainingService {
private paiRestServerPort?: number;
private nniManagerIpConfig?: NNIManagerIpConfig;
private copyExpCodeDirPromise?: Promise<void>;
private copyAuthFilePromise?: Promise<void>;
private versionCheck: boolean = true;
private logCollection: string;
private isMultiPhase: boolean = false;
private authFileHdfsPath: string | undefined = undefined;

constructor() {
this.log = getLogger();
Expand Down Expand Up @@ -292,6 +294,12 @@ class PAITrainingService implements TrainingService {
HDFSClientUtility.getHdfsExpCodeDir(this.paiClusterConfig.userName),
this.hdfsClient
);

// Upload authFile to hdfs
if (this.paiTrialConfig.authFile) {
this.authFileHdfsPath = unixPathJoin(HDFSClientUtility.hdfsExpRootDir(this.paiClusterConfig.userName), 'authFile');
this.copyAuthFilePromise = HDFSClientUtility.copyFileToHdfs(this.paiTrialConfig.authFile, this.authFileHdfsPath, this.hdfsClient);
}

deferred.resolve();
break;
Expand Down Expand Up @@ -373,6 +381,10 @@ class PAITrainingService implements TrainingService {
await this.copyExpCodeDirPromise;
}

//Make sure authFile is copied from local to HDFS
if (this.paiTrialConfig.authFile) {
await this.copyAuthFilePromise;
}
// Step 1. Prepare PAI job configuration

const trialLocalTempFolder: string = path.join(getExperimentRootDir(), 'trials-local', trialJobId);
Expand Down Expand Up @@ -449,7 +461,7 @@ class PAITrainingService implements TrainingService {
// Add Virutal Cluster
this.paiTrialConfig.virtualCluster === undefined ? 'default' : this.paiTrialConfig.virtualCluster.toString(),
//Task auth File
this.paiTrialConfig.authFile
this.authFileHdfsPath
);

// Step 2. Upload code files in codeDir onto HDFS
Expand Down
4 changes: 4 additions & 0 deletions src/sdk/pynni/nni/hyperopt_tuner/hyperopt_tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,10 @@ def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
rval = self.CL_rval
else:
rval = self.rval
# ignore duplicated reported final result (due to aware of intermedate result)
if parameter_id not in self.running_data:
logger.info("Received duplicated final result with parameter id: %s", parameter_id)
return
self.running_data.remove(parameter_id)

# update the reward of optimal_y
Expand Down
48 changes: 43 additions & 5 deletions src/webui/src/App.tsx
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
import * as React from 'react';
import { Row, Col } from 'antd';
import axios from 'axios';
import { COLUMN, MANAGER_IP } from './static/const';
import './App.css';
import SlideBar from './components/SlideBar';

interface AppState {
interval: number;
whichPageToFresh: string;
columnList: Array<string>;
concurrency: number;
}

class App extends React.Component<{}, AppState> {
Expand All @@ -14,7 +18,9 @@ class App extends React.Component<{}, AppState> {
super(props);
this.state = {
interval: 10, // sendons
whichPageToFresh: ''
whichPageToFresh: '',
columnList: COLUMN,
concurrency: 1
};
}

Expand All @@ -31,25 +37,57 @@ class App extends React.Component<{}, AppState> {
}
}

changeColumn = (columnList: Array<string>) => {
if (this._isMounted === true) {
this.setState(() => ({ columnList: columnList }));
}
}

changeConcurrency = (val: number) => {
if (this._isMounted === true) {
this.setState(() => ({ concurrency: val }));
}
}

getConcurrency = () => {
axios(`${MANAGER_IP}/experiment`, {
method: 'GET'
})
.then(res => {
if (res.status === 200) {
const params = res.data.params;
if (this._isMounted) {
this.setState(() => ({ concurrency: params.trialConcurrency }));
}
}
});
}

componentDidMount() {
this._isMounted = true;
this.getConcurrency();
}

componentWillUnmount() {
this._isMounted = false;
}
render() {
const { interval, whichPageToFresh } = this.state;
const { interval, whichPageToFresh, columnList, concurrency } = this.state;
const reactPropsChildren = React.Children.map(this.props.children, child =>
// tslint:disable-next-line:no-any
React.cloneElement(child as React.ReactElement<any>, { interval, whichPageToFresh })
React.cloneElement(
// tslint:disable-next-line:no-any
child as React.ReactElement<any>, {
interval, whichPageToFresh,
columnList, changeColumn: this.changeColumn,
concurrency, changeConcurrency: this.changeConcurrency
})
);
return (
<Row className="nni" style={{ minHeight: window.innerHeight }}>
<Row className="header">
<Col span={1} />
<Col className="headerCon" span={22}>
<SlideBar changeInterval={this.changeInterval} changeFresh={this.changeFresh}/>
<SlideBar changeInterval={this.changeInterval} changeFresh={this.changeFresh} />
</Col>
<Col span={1} />
</Row>
Expand Down
3 changes: 1 addition & 2 deletions src/webui/src/components/Modal/ExperimentDrawer.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,7 @@ class ExperimentDrawer extends React.Component<ExpDrawerProps, ExpDrawerState> {
let trialMessagesArr = res1.data;
const interResultList = res2.data;
Object.keys(trialMessagesArr).map(item => {
// transform hyperparameters as object to show elegantly
trialMessagesArr[item].hyperParameters = JSON.parse(trialMessagesArr[item].hyperParameters);
// not deal with trial's hyperParameters
const trialId = trialMessagesArr[item].id;
// add intermediate result message
trialMessagesArr[item].intermediate = [];
Expand Down
23 changes: 20 additions & 3 deletions src/webui/src/components/Overview.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ interface OverviewState {
interface OverviewProps {
interval: number; // user select
whichPageToFresh: string;
concurrency: number;
changeConcurrency: (val: number) => void;
}

class Overview extends React.Component<OverviewProps, OverviewState> {
Expand All @@ -61,7 +63,7 @@ class Overview extends React.Component<OverviewProps, OverviewState> {
id: '',
author: '',
experName: '',
runConcurren: 0,
runConcurren: 1,
maxDuration: 0,
execDuration: 0,
MaxTrialNum: 0,
Expand Down Expand Up @@ -264,7 +266,8 @@ class Overview extends React.Component<OverviewProps, OverviewState> {
profile.succTrial += 1;
const desJobDetail: Parameters = {
parameters: {},
intermediate: []
intermediate: [],
multiProgress: 1
};
const duration = (tableData[item].endTime - tableData[item].startTime) / 1000;
const acc = getFinal(tableData[item].finalMetricData);
Expand All @@ -273,6 +276,7 @@ class Overview extends React.Component<OverviewProps, OverviewState> {
if (tempara !== undefined) {
const tempLength = tempara.length;
const parameters = JSON.parse(tempara[tempLength - 1]).parameters;
desJobDetail.multiProgress = tempara.length;
if (typeof parameters === 'string') {
desJobDetail.parameters = JSON.parse(parameters);
} else {
Expand Down Expand Up @@ -462,6 +466,18 @@ class Overview extends React.Component<OverviewProps, OverviewState> {
accNodata, status, errorStr, trialNumber, bestAccuracy, isMultiPhase,
titleMaxbgcolor, titleMinbgcolor, isLogCollection, experimentAPI
} = this.state;
const { concurrency } = this.props;
trialProfile.runConcurren = concurrency;
Object.keys(experimentAPI).map(item => {
if (item === 'params') {
const temp = experimentAPI[item];
Object.keys(temp).map(index => {
if (index === 'trialConcurrency') {
temp[index] = concurrency;
}
});
}
});

return (
<div className="overview">
Expand All @@ -480,7 +496,8 @@ class Overview extends React.Component<OverviewProps, OverviewState> {
bestAccuracy={bestAccuracy}
status={status}
errors={errorStr}
updateFile={this.showSessionPro}
concurrency={concurrency}
changeConcurrency={this.props.changeConcurrency}
/>
</Col>
{/* experiment parameters search space tuner assessor... */}
Expand Down
4 changes: 2 additions & 2 deletions src/webui/src/components/SlideBar.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ class SlideBar extends React.Component<SliderProps, SliderState> {
onChange={this.handleVisibleChange}
title={
<span>
<span>Download</span>
<span>View</span>
</span>
}
>
Expand Down Expand Up @@ -234,7 +234,7 @@ class SlideBar extends React.Component<SliderProps, SliderState> {
>
<a className="ant-dropdown-link" href="#">
<Icon type="download" className="down-icon" />
<span>Download</span>
<span>View</span>
{
menuVisible
?
Expand Down
Loading