-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Engine returned by deepspeed.initialize() on the wrong device #1761
Comments
I've checked the source code of So I run the other code snippet as below
Now the |
Hi @skpig - apologies for not replying sooner, is this still a bug that you believe you're hitting? And if so, could you confirm it happens with the latest DeepSpeed too? |
Hi @skpig - closing this issue for now given the older version of DeepSpeed. If you are hitting this still, please reply and we can re-open and will gladly work on debugging it now. Apologies for the long delay in replying the first time. |
I'm facing the same problem. Even with --include local_host there's still memory being located on GPU:0. |
Describe the bug
According to #662, the
--include
arguments can set theCUDA_VISIBLE_DEVICES
properly. But the engine returned by 1deepspeed.initialize()1 is on the wrong device.To Reproduce
launch the code below with
deepspeed --include="localhost:3" train.py --deepspeed --deepspeed_config config.json
Expected behavior
I'm willing to run the model on device 3, but the engine is on device 0.
The text was updated successfully, but these errors were encountered: