-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
Does this pr support heterogeneousConfig? |
|
||
If not specified, this will be the default IPv4 address of outgoing connection. | ||
|
||
// there is an easy way to get the IP mentioned above, I will remove netifaces denendency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: dependency
use annotation | ||
-------------- | ||
|
||
Enable `annotation <https://nni.readthedocs.io/en/stable/Tutorial/AnnotationSpec.html>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest relative path
tuner | ||
----- | ||
|
||
Specify the tuner [TODO:link] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have the link now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a doc mention built-in tuner and custom tuner at same time.
training service | ||
---------------- | ||
|
||
Specify `training service <https://nni.readthedocs.io/en/stable/TrainingService/Overview.html>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
This must be an absolute path. | ||
|
||
|
||
open pai config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think openpai is a word (at least OpenPAI is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it will be openpaiConfig
and openpai_config
in yaml and python respectively. I think it looks weird. Discuss it in meeting.
|
||
If your are using desktop system with GUI, set this to ``True``. | ||
|
||
// need to discuss default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value here?
|
||
If set to zero, trials will have no access to any GPU. | ||
|
||
If not specified, trials will be created and scheduled as if they do not use GPU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the conclusion from last meeting? What does -1, 0, None mean respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change from v1.
nni/experiment/config/convert.py
Outdated
@@ -76,7 +76,7 @@ def to_v1_yaml(config: ExperimentConfig, skip_nnictl: bool = False) -> Dict[str, | |||
data['remoteConfig'] = {'reuse': ts['reuseMode']} | |||
data['machineList'] = [] | |||
for machine in ts['machineList']: | |||
machine = { | |||
data['machineList'].append({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mistake
nni/experiment/config/convert.py
Outdated
data['trial']['image'] = ts['docker_image'] | ||
data['trial']['nniManagerNFSMountPath'] = ts['local_storage_mount_point'] | ||
data['trial']['containerNFSMountPath'] = ts['container_storage_mount_point'] | ||
data['paiConfig'] = { | ||
'userName': ts['username'], | ||
'token': ts['token'], | ||
'host': 'https://' + ts['host'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host does not contains prefix? how about http://
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean there are cases that OpenPAI cluster supports only HTTP and not HTTPS?
ret.append({'frameworkcontroller_config': experiment_config['frameworkcontrollerConfig']}) | ||
|
||
elif config.training_service.platform == 'adl': | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pass adl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/microsoft/nni/blob/master/nni/tools/nnictl/launcher.py#L133
ADL does not use cluster metadata.
nni/experiment/config/remote.py
Outdated
port: int = 22 | ||
user: str | ||
password: Optional[str] = None | ||
ssh_key_file: PathLike = '~/.ssh/id_rsa' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ssh_key is optional key, is used to connect remote machine, default is not '~/.ssh/id_rsa'. I think set it to None as default is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in doc, the behavior is:
- when password is explicitly set, use password
- when password is not set and identity file is explicitly set, use specified identify file
- when both are not set, use default identity file of openssh
I expect this behavior because when using openssh directly, I don't need to specify anything when I put identify file at ~/.ssh/id_rsa
.
I know it can be other algorithms like id_edXXX
though, but I believe they are rarely used.
This PR defines the new experiment config schema.
Example Usage
Following code snippet will generate a valid
config.yaml
file:Or alternatively:
Design Notes
The schema should be expressible in both YAML/JSON object and Python dataclass.
In YAML/JSON the keys are expected to be camelCase and in Python they should be snake_case.
This technically is not a problem since
Config.load()
andConfig.json()
have routines to do the convert.I think the consistency with old schema is not a case. Because consistence is to help migration, while we should not push exist users to migrate at all.
The most important goal of this refactor is to reduce users' learning cost. And for exist users, to not learn the cost is zero.
I believe we will continue to support old schema for very long time, maybe until NNI v3.0. So it basically benefits no body to push the migration unless a user is unsatisfied old schema and proactively want a new one.
And in the other hand, the more difference between two schema, the smaller possibility NNI may fail to detect schema version.
Taking Yuge's advice, the config is divided into platform-independent part and nested training service config.
Trial config is no longer nested because I can't find a reason to do so. While tuner/assessor/advisor config is still nested because the same format may repeats for more than one algorithms.
There is a problem that whether training service config should share the same key. Using same key is definitely better in Python, but using "localCofing" / "remoteConfig" may looks better in YAML.
The problem might affect heterogeneous training service. If training services use same key "training service config", it will be a object/dict for heterogeneous. And if they use different keys, there will be multiple training service config objects.