Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented #4

Closed
LeandreSassi opened this issue Jan 9, 2024 · 9 comments

Comments

@LeandreSassi
Copy link

Hi Woctezuma,

thanks for your repo,

In the cell below, after running it with your custom datasets I get the error posted in the title of my Issue.

Also, do you really need to give it a resume file when you want to train a model for the first time?

!python stylegan2-ada-pytorch/train.py \
 --outdir={output_folder} \
 --snap={snap} \
 --metrics=none \
 --data={custom_dataset} \
 --cfg=auto_norp \
 --cifar_tune=1 \
 --gamma={gamma} \
 --kimg={kimg} \
 --batch={mini_batch} \
 --cfg_map=8 \
 --augpipe=bg \
 --freezed=10 \
 --resume=ffhq256 

Thanks for your help,

L.

@woctezuma
Copy link
Owner

woctezuma commented Jan 9, 2024

Hello!

Warning

First of all, know that training a model on Google Colab is painful, it is much more convenient to use paid platforms if you don't want to deal with sessions closing inadvertently, frequently resuming training, machines which are not powerful, etc.

Important

Second, I did not have time to post the results which I had obtained. From what I can remember, they looked like the ones in my other repositories at steam-stylegan2 (game banners) and at steam-lightweight-gan (Steam-OneFace-small dataset). The illustration in the current repository is a projection using a model pre-trained by Nvidia, without game banners!

Tip

Third, there may be better methods nowadays, either via a newer version of StyleGAN or with diffusion models.

In the cell below, after running it with your custom datasets I get the error posted in the title of my Issue.

I will have a look.

Also, do you really need to give it a resume file when you want to train a model for the first time?

This is done to perform transfer learning from FFHQ trained at 256x256. See the documentation by Nvidia at:

Transfer learning is not mandatory, but it may help reaching a satisfying result without using too much computation time.

This is mostly relevant if the original dataset (here, FFHQ, which contains faces) is similar to your own dataset (e.g. banners of Steam games which feature a single prominent face). For the GIF illustration shown on the README, you can see projections obtained "with a network pre-trained by Nvidia on the LSUN DOG dataset", hence why I chose banners of games which featured a dog.

@woctezuma
Copy link
Owner

Alright, I see the same error message as you saw.

Traceback (most recent call last):
  File "/content/stylegan2-ada-pytorch/train.py", line 556, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/content/stylegan2-ada-pytorch/train.py", line 549, in main
    subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
  File "/content/stylegan2-ada-pytorch/train.py", line 399, in subprocess_fn
    training_loop.training_loop(rank=rank, **args)
  File "/content/stylegan2-ada-pytorch/training/training_loop.py", line 299, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, sync=sync, gain=gain)
  File "/content/stylegan2-ada-pytorch/training/loss.py", line 131, in accumulate_gradients
    (real_logits * 0 + loss_Dreal + loss_Dr1).mean().mul(gain).backward()
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented

As well as a few warnings:

Creating output directory...
Launching processes...
Loading training set...

/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py:64: UserWarning: `data_source` argument is not used and will be removed in 2.2.0.You may still have custom implementation that utilizes it.
  warnings.warn("`data_source` argument is not used and will be removed in 2.2.0."

/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(

Num images:  2472
Image shape: [3, 256, 256]
Label shape: [0]

And many times:

/content/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 2.1.0+cu121. Falling back to torch.nn.functional.conv2d().
  warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d().')

And a few times:

/content/stylegan2-ada-pytorch/torch_utils/ops/grid_sample_gradfix.py:39: UserWarning: grid_sample_gradfix not supported on PyTorch 2.1.0+cu121. Falling back to torch.nn.functional.grid_sample().
  warnings.warn(f'grid_sample_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.grid_sample().')

@woctezuma
Copy link
Owner

woctezuma commented Jan 9, 2024

Related issues:

Related pull request, with a simple fix which may work:

It seems that one has to use an earlier version of PyTorch (version 1), while version 2 is installed by default on Colab nowadays.

@LeandreSassi
Copy link
Author

Hi Woctezuma,

Thanks a lot for your response. I saw Jeff Heaton training his GANs on Colab Pro. Do you mean this with a paid platform? If not wich would you recommend?

Thanks !

Leandre Sassi

@LeandreSassi
Copy link
Author

I found the solution here dvschultz/stylegan2-ada-pytorch#45 (comment)

Thanks again. Maybe this could be helpfull to update your colab!

Have a good day,

Leandre Sassi

@woctezuma
Copy link
Owner

woctezuma commented Jan 10, 2024

Hello!

I saw Jeff Heaton training his GANs on Colab Pro. Do you mean this with a paid platform? If not wich would you recommend?

I have only had some experience with OVH's AI Training and AI Notebooks, as I had a chance to try their platform for free when they were beta-testing their service near the official launch.

I don't have any experience with other platforms, so it is not sufficient knowledge to recommend one over another. 😅

I found the solution here dvschultz/stylegan2-ada-pytorch#45 (comment)

Thanks again. Maybe this could be helpfull to update your colab!

Thank you! I will have a look at:

as well as:

Have a nice day!

@woctezuma
Copy link
Owner

For info, the following commit:

fixes this warning which appeared many times:

/content/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 2.1.0+cu121. Falling back to torch.nn.functional.conv2d().
  warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d().')

@woctezuma
Copy link
Owner

Alright, I believe the notebook works now. I am not sure if one has to downgrade the PyTorch version.

@LeandreSassi
Copy link
Author

No need to Downgrade Python or PyTorch ;) Just trained a model and it went all good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants