-
Notifications
You must be signed in to change notification settings - Fork 549
schedule task by GpuType #1416
Comments
Hi Feng, There is a known bug in the GpuType scheduling which cause this feature currently doesn't work, we are fixing it. |
Thanks for your reply! |
The root cause of this bug is, the node-GPU type information was not upload the ectd server. So in the runtime, It couldn't get GPU type information :( A work around you may can try it use ClusterConfiguration API to set this setting into ectd manually :) |
I'm a little bit confused, frameworklauncher communicates with ZOOKEEPER as I know.
so the clusterconfiguration should be like this:
|
You are right, this gpu-tpye information should store in zookeeper.
|
So, the method you provide can't work right now? (=.=)! |
The method is work, the issue the module call this method forget to call it :(. If you use resetAPI call to set this file, it should work. |
you can find the put Jason format template https://github.com/Microsoft/pai/blob/master/src/cluster-configuration/deploy/gpu-configuration/gpu-configuration.json.template |
OK, I see. the format is the configmap of gpu-configuration and I succeed. Thanks for your help. |
@DongZhaoYu , please reference your fix in this issue and close it. |
Hi,
I submit a task which contains GpuType, but I find PAI doesn't schedule the task by GpuType.
I get into the nodemanager pod, and find userlog. According to the frameworklauncher code, I use key word "NodeGpuType" and find this log:
So I use curl command
curl http://ip:9086/v1/LauncherRequest/ClusterConfiguration
to check clusterconfiguration, and it returns:So I want to know does the value GpuType in task JSON relate to machine-type in cluster-configuration.yaml?
thx.
The text was updated successfully, but these errors were encountered: