RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented #4

LeandreSassi · 2024-01-09T15:05:29Z

Hi Woctezuma,

thanks for your repo,

In the cell below, after running it with your custom datasets I get the error posted in the title of my Issue.

Also, do you really need to give it a resume file when you want to train a model for the first time?

!python stylegan2-ada-pytorch/train.py \
 --outdir={output_folder} \
 --snap={snap} \
 --metrics=none \
 --data={custom_dataset} \
 --cfg=auto_norp \
 --cifar_tune=1 \
 --gamma={gamma} \
 --kimg={kimg} \
 --batch={mini_batch} \
 --cfg_map=8 \
 --augpipe=bg \
 --freezed=10 \
 --resume=ffhq256

Thanks for your help,

L.

The text was updated successfully, but these errors were encountered:

woctezuma · 2024-01-09T15:26:37Z

Hello!

Warning

First of all, know that training a model on Google Colab is painful, it is much more convenient to use paid platforms if you don't want to deal with sessions closing inadvertently, frequently resuming training, machines which are not powerful, etc.

Important

Second, I did not have time to post the results which I had obtained. From what I can remember, they looked like the ones in my other repositories at steam-stylegan2 (game banners) and at steam-lightweight-gan (Steam-OneFace-small dataset). The illustration in the current repository is a projection using a model pre-trained by Nvidia, without game banners!

Tip

Third, there may be better methods nowadays, either via a newer version of StyleGAN or with diffusion models.

In the cell below, after running it with your custom datasets I get the error posted in the title of my Issue.

I will have a look.

Also, do you really need to give it a resume file when you want to train a model for the first time?

This is done to perform transfer learning from FFHQ trained at 256x256. See the documentation by Nvidia at:

https://github.com/NVlabs/stylegan2-ada-pytorch/blob/d72cc7d041b42ec8e806021a205ed9349f87c6a4/docs/train-help.txt#L33

Transfer learning is not mandatory, but it may help reaching a satisfying result without using too much computation time.

This is mostly relevant if the original dataset (here, FFHQ, which contains faces) is similar to your own dataset (e.g. banners of Steam games which feature a single prominent face). For the GIF illustration shown on the README, you can see projections obtained "with a network pre-trained by Nvidia on the LSUN DOG dataset", hence why I chose banners of games which featured a dog.

woctezuma · 2024-01-09T15:44:50Z

Alright, I see the same error message as you saw.

Traceback (most recent call last):
  File "/content/stylegan2-ada-pytorch/train.py", line 556, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/content/stylegan2-ada-pytorch/train.py", line 549, in main
    subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
  File "/content/stylegan2-ada-pytorch/train.py", line 399, in subprocess_fn
    training_loop.training_loop(rank=rank, **args)
  File "/content/stylegan2-ada-pytorch/training/training_loop.py", line 299, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, sync=sync, gain=gain)
  File "/content/stylegan2-ada-pytorch/training/loss.py", line 131, in accumulate_gradients
    (real_logits * 0 + loss_Dreal + loss_Dr1).mean().mul(gain).backward()
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented

As well as a few warnings:

Creating output directory...
Launching processes...
Loading training set...

/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py:64: UserWarning: `data_source` argument is not used and will be removed in 2.2.0.You may still have custom implementation that utilizes it.
  warnings.warn("`data_source` argument is not used and will be removed in 2.2.0."

/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(

Num images:  2472
Image shape: [3, 256, 256]
Label shape: [0]

And many times:

/content/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 2.1.0+cu121. Falling back to torch.nn.functional.conv2d().
  warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d().')

And a few times:

/content/stylegan2-ada-pytorch/torch_utils/ops/grid_sample_gradfix.py:39: UserWarning: grid_sample_gradfix not supported on PyTorch 2.1.0+cu121. Falling back to torch.nn.functional.grid_sample().
  warnings.warn(f'grid_sample_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.grid_sample().')

woctezuma · 2024-01-09T15:46:35Z

Related issues:

Related pull request, with a simple fix which may work:

Update grid_sample_gradfix.py NVlabs/stylegan2-ada-pytorch#294

It seems that one has to use an earlier version of PyTorch (version 1), while version 2 is installed by default on Colab nowadays.

LeandreSassi · 2024-01-09T18:43:29Z

Hi Woctezuma,

Thanks a lot for your response. I saw Jeff Heaton training his GANs on Colab Pro. Do you mean this with a paid platform? If not wich would you recommend?

Thanks !

Leandre Sassi

LeandreSassi · 2024-01-09T20:46:41Z

I found the solution here dvschultz/stylegan2-ada-pytorch#45 (comment)

Thanks again. Maybe this could be helpfull to update your colab!

Have a good day,

Leandre Sassi

woctezuma · 2024-01-10T10:34:52Z

Hello!

I saw Jeff Heaton training his GANs on Colab Pro. Do you mean this with a paid platform? If not wich would you recommend?

I have only had some experience with OVH's AI Training and AI Notebooks, as I had a chance to try their platform for free when they were beta-testing their service near the official launch.

I don't have any experience with other platforms, so it is not sufficient knowledge to recommend one over another. 😅

I found the solution here dvschultz/stylegan2-ada-pytorch#45 (comment)

Thanks again. Maybe this could be helpfull to update your colab!

Thank you! I will have a look at:

Fixes issues with Colab being updated to Python 3.10 and CUDA 11.8 dvschultz/stylegan2-ada-pytorch#48

as well as:

Have a nice day!

woctezuma · 2024-01-10T13:02:46Z

For info, the following commit:

NVlabs/stylegan3@407db86

fixes this warning which appeared many times:

/content/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 2.1.0+cu121. Falling back to torch.nn.functional.conv2d().
  warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d().')

woctezuma · 2024-01-10T13:33:29Z

Alright, I believe the notebook works now. I am not sure if one has to downgrade the PyTorch version.

LeandreSassi · 2024-01-10T13:47:05Z

No need to Downgrade Python or PyTorch ;) Just trained a model and it went all good.

woctezuma closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented #4

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented #4

LeandreSassi commented Jan 9, 2024

woctezuma commented Jan 9, 2024 •

edited

Loading

woctezuma commented Jan 9, 2024

woctezuma commented Jan 9, 2024 •

edited

Loading

LeandreSassi commented Jan 9, 2024

LeandreSassi commented Jan 9, 2024

woctezuma commented Jan 10, 2024 •

edited

Loading

woctezuma commented Jan 10, 2024

woctezuma commented Jan 10, 2024

LeandreSassi commented Jan 10, 2024

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented #4

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented #4

Comments

LeandreSassi commented Jan 9, 2024

woctezuma commented Jan 9, 2024 • edited Loading

woctezuma commented Jan 9, 2024

woctezuma commented Jan 9, 2024 • edited Loading

LeandreSassi commented Jan 9, 2024

LeandreSassi commented Jan 9, 2024

woctezuma commented Jan 10, 2024 • edited Loading

woctezuma commented Jan 10, 2024

woctezuma commented Jan 10, 2024

LeandreSassi commented Jan 10, 2024

woctezuma commented Jan 9, 2024 •

edited

Loading

woctezuma commented Jan 9, 2024 •

edited

Loading

woctezuma commented Jan 10, 2024 •

edited

Loading