-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why did you use MomentumOptimizer? and dropout... #28
Comments
Hello @taki0112 A1. As we mentioned in the paper, we directly followed ResNet's optimization settings (https://github.com/facebook/fb.resnet.torch), except that we train 300 epochs instead of ~160 epochs. We didn't try any other optimizers. A2. In our experiment, we applied dropout to every conv layer except the first one of the network. But I guess there should be no significant difference whether you apply dropout on trans layers or not. A3. This depends on what package you are using. Sorry I'm not familiar with Tensorflow's details. A4. Global Average Pooling means you pool a feature map to a single number by taking average. For example, you have a 8x8 feature map, you take average of those 64 numbers and produce one number. For tensorflow usage question like 3 and 4, you can probably find answers by looking at the third- |
Thank you def Global_Average_Pooling(x, stride=1) :
width = np.shape(x)[1]
height = np.shape(x)[2]
pool_size = [width, height]
return tf.layers.average_pooling2d(inputs=x, pool_size=pool_size, strides=stride)
# The stride value does not matter But I have some questions.
What is the reason? |
|
I think the author has a good explanation. Regarding the dropout, why did not use dropout in imagenet case? it is the big dataset, so we do not need it, right? Dropout often uses in before fully connected layer. But, you did not use it in both imagenet and cifar10, why? Thanks |
@John1231983 Because ImageNet is big and also because we use heavy data augmentation, so we don't use dropout. This is also following our base code framework fb.resnet.torch. For CIFAR10, when we use data augmentation (C10+), we don't use dropout. When we don't use data augmentation (C10), we actually use dropout. We've mentioned this in the paper. |
Hello
When I saw DenseNet, I implemented it with Tensorflow. (Using MNIST data)
The Questions are :
When I experimented, AdamOptimizer performed better than MomentumOptimizer.
Is this just MNIST? I do not yet have an experiment with CIFAR.
In the case of dropout, I apply only to the bottleneck layer, not to the transition layer. is this right?
Does Batch Normalization only apply when training? Or does it apply to both test and training?
I wonder what global average pooling is.
And I wonder how to do it in tensorflow.
Please advise if you have any special reason.
And if you can see the tensorflow code, I'd like you to see if I implemented it correctly.
https://github.com/taki0112/Densenet-Tensorflow
Thank you
The text was updated successfully, but these errors were encountered: