Why did you use MomentumOptimizer? and dropout... #28

taki0112 · 2017-08-08T09:58:55Z

Hello
When I saw DenseNet, I implemented it with Tensorflow. (Using MNIST data)

The Questions are :

When I experimented, AdamOptimizer performed better than MomentumOptimizer.
Is this just MNIST? I do not yet have an experiment with CIFAR.
In the case of dropout, I apply only to the bottleneck layer, not to the transition layer. is this right?
Does Batch Normalization only apply when training? Or does it apply to both test and training?
I wonder what global average pooling is.
And I wonder how to do it in tensorflow.

Please advise if you have any special reason.
And if you can see the tensorflow code, I'd like you to see if I implemented it correctly.
https://github.com/taki0112/Densenet-Tensorflow

Thank you

liuzhuang13 · 2017-08-08T20:01:49Z

Hello @taki0112

A1. As we mentioned in the paper, we directly followed ResNet's optimization settings (https://github.com/facebook/fb.resnet.torch), except that we train 300 epochs instead of ~160 epochs. We didn't try any other optimizers.

A2. In our experiment, we applied dropout to every conv layer except the first one of the network. But I guess there should be no significant difference whether you apply dropout on trans layers or not.

A3. This depends on what package you are using. Sorry I'm not familiar with Tensorflow's details.

A4. Global Average Pooling means you pool a feature map to a single number by taking average. For example, you have a 8x8 feature map, you take average of those 64 numbers and produce one number.

For tensorflow usage question like 3 and 4, you can probably find answers by looking at the third-
party tensorflow implementations we posted on our readme page. Thanks

taki0112 · 2017-08-11T08:53:43Z

Thank you
I think I can do global average pooling as follows.

    def Global_Average_Pooling(x, stride=1) :
        width = np.shape(x)[1]
        height = np.shape(x)[2]
        pool_size = [width, height]
        return tf.layers.average_pooling2d(inputs=x, pool_size=pool_size, strides=stride) 
        # The stride value does not matter

But I have some questions.

I experimented with MNIST data for a total of 100 layers and growth_k = 12. However, the result is worse than 20 layers. The training speed is very slow and the increase in accuracy is very narrow.
why is not there a Transition Layer (4) in paper ?
There are only 3 (Dense Block + Transition Layers) and final dense block and Classification layer..

What is the reason?

liuzhuang13 · 2017-08-11T09:53:34Z

@taki0112

Most people train a network with less than 5 layers and achieve very high accuracy on MNIST because it is such a simple dataset. If you train a too large network on MNIST, it might overfit to the training set and the accuracy might be worse. Thanks
Because transition layers serves the purpose of downsampling. At last we have the global average pooling to do the downsample but we don't call it a transition layer.

John1231983 · 2017-08-11T12:02:54Z

I think the author has a good explanation. Regarding the dropout, why did not use dropout in imagenet case? it is the big dataset, so we do not need it, right? Dropout often uses in before fully connected layer. But, you did not use it in both imagenet and cifar10, why? Thanks

liuzhuang13 · 2017-10-18T05:28:50Z

@John1231983 Because ImageNet is big and also because we use heavy data augmentation, so we don't use dropout. This is also following our base code framework fb.resnet.torch.

For CIFAR10, when we use data augmentation (C10+), we don't use dropout. When we don't use data augmentation (C10), we actually use dropout. We've mentioned this in the paper.

taki0112 changed the title ~~Why did you use MomentumOptimizer?~~ Why did you use MomentumOptimizer? and dropout... Aug 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why did you use MomentumOptimizer? and dropout... #28

Why did you use MomentumOptimizer? and dropout... #28

taki0112 commented Aug 8, 2017 •

edited

Loading

liuzhuang13 commented Aug 8, 2017 •

edited

Loading

taki0112 commented Aug 11, 2017 •

edited

Loading

liuzhuang13 commented Aug 11, 2017 •

edited

Loading

John1231983 commented Aug 11, 2017

liuzhuang13 commented Oct 18, 2017

Why did you use MomentumOptimizer? and dropout... #28

Why did you use MomentumOptimizer? and dropout... #28

Comments

taki0112 commented Aug 8, 2017 • edited Loading

liuzhuang13 commented Aug 8, 2017 • edited Loading

taki0112 commented Aug 11, 2017 • edited Loading

liuzhuang13 commented Aug 11, 2017 • edited Loading

John1231983 commented Aug 11, 2017

liuzhuang13 commented Oct 18, 2017

taki0112 commented Aug 8, 2017 •

edited

Loading

liuzhuang13 commented Aug 8, 2017 •

edited

Loading

taki0112 commented Aug 11, 2017 •

edited

Loading

liuzhuang13 commented Aug 11, 2017 •

edited

Loading