Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why the model use only first three channels of the last layer output ? #9

Open
apple2373 opened this issue Apr 18, 2019 · 6 comments
Open

Comments

@apple2373
Copy link

z = self.conv_to_rgb(z)
z = z[:, :3, ...]

I'm trying to understand the model by reading code. I noticed that conv_to_rgb has actually 128 channels but only first three are used for the final RGB image. Why do you do this? What the other 125 channels for?

@thomwolf
Copy link
Member

They are dropped.
This is done several times in the model actually, also here:

if self.drop_channels:
new_channels = x0.shape[1] // 2
x0 = x0[:, :new_channels, ...]

If you read the latest version of the BigGAN paper, you will see it part of the changes in the new "deep" versions of BigGAN.

@apple2373
Copy link
Author

apple2373 commented Apr 18, 2019

Thanks for the reply! I think I am confused. If you will simply drop channels, why don't you use that smaller channels in the training time? I mean, in the last layer, for example, why don't you use just nn.Conv2d(128,3) instead of training nn.Conv2d(128,128) and dropping 125 channels in the inference time?

Could you point to specific page and line where authors are explaining this part? I tried to find it in the 1809.11096v2, but I could not find it. The table 9.a just says BN, ReLU, 3 × 3 Conv ch → 3.

@apple2373
Copy link
Author

apple2373 commented Apr 25, 2019

I still can't understand why this repository use strange channel dropping trick. Is it this repository owner's invented trick that training larger channel and dropping at inference time ?

I checked the BigGAN author's implementation but he does not seem to use channel dropping...
https://github.com/ajbrock/BigGAN-PyTorch/blob/ba3d05754120e9d3b68313ec7b0f9833fc5ee8bc/BigGANdeep.py#L68-L93

@thomwolf
Copy link
Member

Well I'm not very familiar with Andy's implementation but I see a channel dropping part here: https://github.com/ajbrock/BigGAN-PyTorch/blob/ba3d05754120e9d3b68313ec7b0f9833fc5ee8bc/BigGANdeep.py#L54-L56

I'm not sure Andy's implementation can load the --deep models, which is what the present repo is based on (see ajbrock/BigGAN-PyTorch#10).

Maybe you would be better off asking in the issues of https://github.com/ajbrock/BigGAN-PyTorch ?

@apple2373
Copy link
Author

Oh, I missed that part. If the original one uses the channel drop, it makes sense to use it here. Thanks! I'll ask authors directly.

@apple2373
Copy link
Author

The original author answered. It's because tensorflow will be faster when the number of input and output channels are the same. I think it's okay to delete the unused channels from this repository as it just wastes the computational resources when it comes to pytorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants