You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a the code on a single box with a TITAN and 2080Ti. The trainer is running just on the TITAN. I have a problem where the system will lock up the cpu and kill the local network. Very non-performant ...
It seems to be related to microsoft/DeepSpeed#679
It appears changing the optimizer section of the JSON file seems to allow it to run. A bit slower, but it does run.
I am running a the code on a single box with a TITAN and 2080Ti. The trainer is running just on the TITAN. I have a problem where the system will lock up the cpu and kill the local network. Very non-performant ...
It seems to be related to microsoft/DeepSpeed#679
It appears changing the optimizer section of the JSON file seems to allow it to run. A bit slower, but it does run.
The big thing being setting "torch_adam" to true.
Any ideas for regaining regular performance would be appreciated.
The text was updated successfully, but these errors were encountered: