Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Remove duplicate data under /tmp folder, and other small changes. #2484

Merged
merged 13 commits into from
Jun 29, 2020

Conversation

squirrelsc
Copy link
Member

Because there was concurrent issue on downloading pytorch mnist data, so there is trial id in dataset path. But it causes many copies of data on /tmp folder. This fix changes the folder to relative path to avoid duplicate data.

It may bring concurrent issue back on local platform, but not others, but it's mitigated already. First, if data is downloaded, pytorch will verify MD5, and not download it again. Second, pytorch examples run as single instance, and test cases is changed to single instance also.

Small improvements,

  1. remove incorrect relative path in mnist-pbt-tuner-pytorch, it may be copied from mnist-pytorch.
  2. let classic_nas example the same as above.

maxTrialNum: 2
trialConcurrency: 2
maxTrialNum: 1
trialConcurrency: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add download script at the beginning of each IT?

@QuanluZhang QuanluZhang requested a review from ultmaster June 29, 2020 02:42
@squirrelsc squirrelsc merged commit 0fbaff6 into microsoft:master Jun 29, 2020
@squirrelsc squirrelsc deleted the dup-example-data branch June 29, 2020 07:55
squirrelsc added a commit that referenced this pull request Jun 30, 2020
Designed new interface to support reusable training service, currently only applies to OpenPAI, and default disabled.

Replace trial_keeper.py to trial_runner.py, trial_runner holds an environment, and receives commands from nni manager to run or stop an trial, and return events to nni manager.
Add trial dispatcher, which inherits from original trianing service interface. It uses to share as many as possible code of all training service, and isolate with training services.
Add EnvironmentService interface to manage environment, including start/stop an environment, refresh status of environments.
Add command channel on both nni manager and trial runner parts, it supports different ways to pass messages between them. Current supported channels are file, web sockets. and supported commands from nni manager are start, kill trial, send new parameters; from runner are initialized(support some channel doesn't know which runner connected), trial end, stdout ((new type), including metric like before), version check (new type), gpu info (new type).
Add storage service to wrapper a storage to standard file operations, like NFS, azure storage and so on.
Partial support run multiple trials in parallel on runner side, but not supported by trial dispatcher side.
Other minor changes,

Add log_level to TS UT, so that UT can show debug level log.
Expose platform to start info.
Add RouterTrainingService to keep origianl OpenPAI training service, and support dynamic IOC binding.
Add more GPU info for future usage, including GPU mem total/free/used, gpu type.
Make some license information consistence.
Fix async/await problems on Array.forEach, this method doesn't support async actually.
Fix IT errors on download data, which causes by my #2484 .
Accelerate some run loop pattern by reducing sleep seconds.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants