-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the latest version of PyTorch supported? #52
Comments
I have been using NVIDIA container image for PyTorch, release 19.09. Looks like that corresponds to PyTorch version 1.2.0. I have not tried this with later versions. |
Just an update on this issue, I checked all PyTorch releases, and until NVIDIA PyTorch release 20.01, corresponding to PyTorch release 1.4.0, pipedream works, but since NVIDIA release 20.02, a runtime error occurs, #31 , the same error log as this issue. I tried to locate the issue, it seems from an in-place version checking feature added by release 1.5.0. And the problem comes from when the second last stage tries to start its backward pass, and if load_old_params() before backward(), the error will show. |
Thanks for doing this! This is helpful! I will look into this in the next couple of days! |
I temporarily make pipedream run on latest PyTorch by eliminating the version check in unpack() in torch/csrc/autograd/saved_variable.cpp, it seems runtime errors come from this version checking (really dirty solution). I have not really understood pipedream's manipulation on the backward propagated gradients, but I guess this comes from one more in-place operation on the tensors passing between stages. I think this may help you solve this problem.
|
@SimonZsx Have you tried to comment out the version checking code in Pytorch and see whether it is working? |
@deepakn94 @SimonZsx Sorry to bother you, I'm reproducing the training process of pipedream, and hope to deploy it in torch >= 1.5.0. May I ask if there are any solutions currently? |
I also have this confusion. Do you have any progress? Maybe we can talk about it. |
Hi, the commenting and recompiling solution work, but it’s kind of dirty. The problem can be avoided by not using the weight stashing, because this feature seems to be used for gradient version checking, and the weight stashing breaks the checking. One of my folks says the version can be manually set to avoid this error, but I have not checked it yet; just a tiny hint for you. |
Thanks! |
Same to you. Do you have any progress? |
Recompute the forward before each backward would solve the tensor version issue. Simply call self._run_forward(tensors) right after line 571. |
Hi, what's the latest version of stable PyTorch release supported? Which version is pre_hook_pytorch_latest.patch for? Thanks for your reply in advance.
The text was updated successfully, but these errors were encountered: