What's the latest version of PyTorch supported? #52

SimonZsx · 2020-06-16T11:14:40Z

Hi, what's the latest version of stable PyTorch release supported? Which version is pre_hook_pytorch_latest.patch for? Thanks for your reply in advance.

deepakn94 · 2020-06-16T17:02:47Z

I have been using NVIDIA container image for PyTorch, release 19.09. Looks like that corresponds to PyTorch version 1.2.0. I have not tried this with later versions.

SimonZsx · 2020-06-18T12:24:46Z

Just an update on this issue, I checked all PyTorch releases, and until NVIDIA PyTorch release 20.01, corresponding to PyTorch release 1.4.0, pipedream works, but since NVIDIA release 20.02, a runtime error occurs, #31 , the same error log as this issue. I tried to locate the issue, it seems from an in-place version checking feature added by release 1.5.0. And the problem comes from when the second last stage tries to start its backward pass, and if load_old_params() before backward(), the error will show.

deepakn94 · 2020-06-18T16:33:14Z

Thanks for doing this! This is helpful! I will look into this in the next couple of days!

SimonZsx · 2020-06-26T08:01:33Z

I temporarily make pipedream run on latest PyTorch by eliminating the version check in unpack() in torch/csrc/autograd/saved_variable.cpp, it seems runtime errors come from this version checking (really dirty solution). I have not really understood pipedream's manipulation on the backward propagated gradients, but I guess this comes from one more in-place operation on the tensors passing between stages. I think this may help you solve this problem.


Variable SavedVariable::unpack(std::shared_ptr<Node> saved_for) const {
  if (!data_.defined()) {
    if (!was_default_constructed_) {
      throw std::runtime_error(ERR_BACKWARD_TWICE);
    }
    return Variable();
  }

  auto grad_fn = is_inplace_view_ ? weak_grad_fn_.lock() : grad_fn_;
  if (has_grad_fn_ && !grad_fn) {
    if (!saved_for) {
      // If saving the grad_fn would create a circular reference, then it must
      // be passed in to the unpack function.
      throw std::runtime_error("No grad_fn for non-leaf saved variable");
    }
    grad_fn = std::move(saved_for);
  }
  if (saved_version_ != version_counter_.current_version()) {
    std::stringstream message;
    message << "one of the variables needed for gradient computation has been "
        "modified by an inplace operation: [" << data_.toString() << " "
        << data_.sizes() << "]";
    if (grad_fn) {
        message << ", which is output " << output_nr_
            << " of " << grad_fn->name() << ",";
    }
    message << " is at version " << version_counter_.current_version()
        << "; expected version " << saved_version_ << " instead.";
    if (!AnomalyMode::is_enabled()) {
        message << " Hint: enable anomaly detection to find the operation "
            "that failed to compute its gradient, with torch.autograd."
            "set_detect_anomaly(True).";
    }
    else {
        message << " Hint: the backtrace further above shows the operation "
            "that failed to compute its gradient. The variable in question "
            "was changed in there or anywhere later. Good luck!";
    }
    throw std::runtime_error(message.str());
  }

BestSonny · 2020-06-27T02:14:41Z

@SimonZsx Have you tried to comment out the version checking code in Pytorch and see whether it is working?

fkh12345 · 2021-12-13T18:20:57Z

@deepakn94 @SimonZsx Sorry to bother you, I'm reproducing the training process of pipedream, and hope to deploy it in torch >= 1.5.0. May I ask if there are any solutions currently?

jglicat · 2021-12-20T16:20:25Z

@deepakn94 @SimonZsx Sorry to bother you, I'm reproducing the training process of pipedream, and hope to deploy it in torch >= 1.5.0. May I ask if there are any solutions currently?

I also have this confusion. Do you have any progress? Maybe we can talk about it.

SimonZsx · 2021-12-21T09:49:49Z

Hi, the commenting and recompiling solution work, but it’s kind of dirty. The problem can be avoided by not using the weight stashing, because this feature seems to be used for gradient version checking, and the weight stashing breaks the checking.

One of my folks says the version can be manually set to avoid this error, but I have not checked it yet; just a tiny hint for you.

fkh12345 · 2021-12-21T11:17:43Z

Hi, the commenting and recompiling solution work, but it’s kind of dirty. The problem can be avoided by not using the weight stashing, because this feature seems to be used for gradient version checking, and the weight stashing breaks the checking.

One of my folks says the version can be manually set to avoid this error, but I have not checked it yet; just a tiny hint for you.

Thanks!

leiguan1210 · 2021-12-28T11:30:17Z

@deepakn94 @SimonZsx Sorry to bother you, I'm reproducing the training process of pipedream, and hope to deploy it in torch >= 1.5.0. May I ask if there are any solutions currently?

I also have this confusion. Do you have any progress? Maybe we can talk about it.

Same to you. Do you have any progress?

ajpluralis · 2024-12-06T01:43:20Z

Recompute the forward before each backward would solve the tensor version issue. Simply call self._run_forward(tensors) right after line 571.
Will be slightly slower due to additional forward but workable solution. Tested on pytorch 2.4.1

SimonZsx mentioned this issue Jul 6, 2020

Unexpected Error #31

Open

SimonZsx mentioned this issue Mar 30, 2023

RuntimeError when executing python driver.py ... hku-systems/vpipe#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the latest version of PyTorch supported? #52

What's the latest version of PyTorch supported? #52

SimonZsx commented Jun 16, 2020

deepakn94 commented Jun 16, 2020

SimonZsx commented Jun 18, 2020

deepakn94 commented Jun 18, 2020

SimonZsx commented Jun 26, 2020 •

edited

Loading

BestSonny commented Jun 27, 2020 •

edited

Loading

fkh12345 commented Dec 13, 2021

jglicat commented Dec 20, 2021

SimonZsx commented Dec 21, 2021

fkh12345 commented Dec 21, 2021

leiguan1210 commented Dec 28, 2021

ajpluralis commented Dec 6, 2024

What's the latest version of PyTorch supported? #52

What's the latest version of PyTorch supported? #52

Comments

SimonZsx commented Jun 16, 2020

deepakn94 commented Jun 16, 2020

SimonZsx commented Jun 18, 2020

deepakn94 commented Jun 18, 2020

SimonZsx commented Jun 26, 2020 • edited Loading

BestSonny commented Jun 27, 2020 • edited Loading

fkh12345 commented Dec 13, 2021

jglicat commented Dec 20, 2021

SimonZsx commented Dec 21, 2021

fkh12345 commented Dec 21, 2021

leiguan1210 commented Dec 28, 2021

ajpluralis commented Dec 6, 2024

SimonZsx commented Jun 26, 2020 •

edited

Loading

BestSonny commented Jun 27, 2020 •

edited

Loading