-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support monitor mode when creating or resuming a new experiment #1933
Changes from 26 commits
704b50e
5b0034e
8fe2588
9fae194
c785655
2f5272c
1892bc2
7c1ab11
8c203f3
d7a62f6
e259d10
4997295
c037a7c
7620e7c
d16dbe9
9ce751d
a0846f2
cd3a912
32efaa3
543239c
36e6e35
f9ee589
b9a7a95
1a5c017
b3c6ea4
667cf96
392460a
69ab1d1
d992e2e
5756894
80d41db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -623,23 +623,41 @@ def show_experiment_info(): | |
content[index].get('endTime'), content[index].get('status'))) | ||
print(TRIAL_MONITOR_TAIL) | ||
|
||
def monitor_experiment(args): | ||
'''monitor the experiment''' | ||
if args.time <= 0: | ||
print_error('please input a positive integer as time interval, the unit is second.') | ||
exit(1) | ||
def set_monitor(auto_exit, time_interval, port=None, pid=None): | ||
'''set the experiment monitor engine''' | ||
while True: | ||
try: | ||
os.system('clear') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed. |
||
update_experiment() | ||
show_experiment_info() | ||
time.sleep(args.time) | ||
if auto_exit: | ||
status = get_experiment_status(port) | ||
if status in ['DONE', 'ERROR', 'STOPPED']: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can print dispatcher and nnimanager log here (if the status is error). Because if user is running it in a container, when the program exit, the container is destroyed too. There is no way to retrieve the error info. Another option is to disable auto_exit in case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
nniManager.log content maybe too long, maybe it's not suitable to show these content in screen. Users can mount NNI's logDir in container to their local path, the logDir contains log files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the case is not for screen. For screen user, they will never seek for foreground. For container users, they don't care. Furthermore, I don't think |
||
print_normal('Experiment status is {0}.'.format(status)) | ||
print_normal('Stopping experiment...') | ||
kill_command(pid) | ||
print_normal('Stop experiment success.') | ||
exit(0) | ||
time.sleep(time_interval) | ||
except KeyboardInterrupt: | ||
if auto_exit: | ||
print_normal('Stopping experiment...') | ||
kill_command(pid) | ||
print_normal('Stop experiment success.') | ||
else: | ||
print_normal('Exiting...') | ||
exit(0) | ||
except Exception as exception: | ||
print_error(exception) | ||
exit(1) | ||
|
||
def monitor_experiment(args): | ||
'''monitor the experiment''' | ||
if args.time <= 0: | ||
print_error('please input a positive integer as time interval, the unit is second.') | ||
exit(1) | ||
set_monitor(False, args.time) | ||
|
||
def export_trials_data(args): | ||
'''export experiment metadata to csv | ||
''' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing vertical separator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.