-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: expected scalar type Float but found Half #1233
Comments
@griff4692, can you share log and stack trace? Also, can you check if the same error happens with |
Yes the same error persists regardless of stage 1, 2, or 3.
|
Thanks for sharing these details. I would like to repro this problem. Can you please share the steps for me to do this? |
Hi - yes, let me try with a toy example!
I'm tied up right now but a simple forward pass with GATConv hopefully will reproduce the error. Just will need to pass it a dummy dataloader for |
@griff4692, got it. No rush, I can wait for your complete toy example. Thanks! |
|
Running this actually supplies a different error which is
Which I've been able to reproduce on my more complex model by just calling pyg-team/pytorch_geometric#2866 Maybe I just can't use deepspeed for this particular application or need to downgrade to earlier versions of either deepspeed or geometric for compatibility. |
I don't have much experience with TorchScript, but I am curious if the original issue can be repro'd without TorchScript. I suspect that in the case an appropriate cast would be the fix. But we can't know for sure unless we get a repro. |
Does deepspeed need everything to be half? If that's the case, it seems incompatible with |
@griff4692, not at all. DeepSpeed can work with fp32 or halfs depending on the configuration. The problem here is that DeepSpeed does not attempt to do any automatic casting in the case of mixed-precision training. For example, can you try running in full fp32 by disabling fp16 in your deepspeed config? You can see docs for fp16 configuration here. |
In PyTorch Lightning, if you use fp32 with deepspeed plugin, you get the following error:
I'll look into circumventing this |
looks like you can re-write config and it may work but not sure how it interacts with all the other settings |
Thanks for checking this out @tjruwase appreciate it! Lightning should be updated to allow FP32 support, let me try make a branch for us to try @griff4692 in Lightning! |
Hi - I was able to manually call
This error only occurs with Deepspeed so maybe something fish is going on with all my manual interventions. |
We have a PR for getting FP32 support on the Lightning side: Lightning-AI/pytorch-lightning#8462 @griff4692 I'll sync with you offline over the issue with |
@SeanNaren, thanks for helping out the Lightning side. Can you both please keep me in the loop if there any issues to fix on DeepSpeed in order to close this? Thanks! |
Closing as this seems to have been fixed on the Lightning side. |
Hi - I'm trying to use the deepspeed plugin with Pytorch Lightning. My code worked before but changing the line in trainer
to add
plugins='deepspeed_stage_3_offload'
Causes the error posted in the title. I've tried casting parameters and variables as float and half, but the error persists.
Any suggestions would be much appreciated as I'm really looking forward to see what deepspeed can do.
I should note that the error is happening in a call to a
pytorch_geometric
method (if that changes anything).The text was updated successfully, but these errors were encountered: