The consistency between the code and the description of the paper. #11

sang-yc · 2021-09-29T13:31:01Z

Hello, I found that the M0 of the code and the M0 of the paper are not the same structure. I would like to ask whether the code of Micro-Block-A, Micro-Block-B, and Micro-Block-C is consistent with the description of the paper and whether there is any difference?
Thank you.

liyunsheng13 · 2021-09-29T17:12:54Z

All the models are consistent with the description in the paper. Using M0 as an example, if you take a look at Table 1 in the paper, you can find the hidden dimension C/R is {8,12,16,32,64,96}, which is exactly the same as the config file used for M0.

sang-yc · 2021-09-30T13:49:29Z

First of all, thank you for your reply！
I still have questions about Dynamic Shift Max. I have carefully studied your paper, Dynamic ReLU. As your paper says, when J = 1, Dynamic Shift Max and Dynamic ReLU are the same.
Why is j taken as 2？ Shouldn't j be the same as the number of groups? When the number of groups is different, shouldn't j change dynamically?
Thank you!

liyunsheng13 · 2021-09-30T21:14:39Z

Like you mentioned, when J=1, Dynamic Shift-Max is just Dynamic ReLU with the expression like y=max(a1x1+b1x1) (x1 is the first channel of the feature map, a1,b1 are the dynamic coefficients). When J=2, the output y will become max(a1x1+a2x2, b1x2+b2x2), where channel x1 and x2 are fused. This is the key difference compared to Dynamic ReLU. In our implementation, actually, we found x2 should be achieved with group shift instead of channel shift, thus it is x_{jC/G}. So the value of J has nothing to do with the group number, it just depends on how many channels you want to fuse and of course J<=G.

sang-yc · 2021-10-02T06:51:27Z

Thank you for your reply！
Dynamic ReLU : y=max(a1x1+b1x1) . According to your code and paper, I think the expression for Dynamic ReLU should be y=max(a1x1+b1)(x1 is the first channel of the feature map, a1,b1 are the dynamic coefficients).I don't know if it's my wrong understanding or your wrong writing.
There are still some in the code that is difficult to understand. What does the parameter in class Dynamic Shift-Max mean?
As follows: activation.py, line 111
def init(self, inp, oup, reduction=4, act_max=1.0, act_relu=True, init_a=[0.0, 0.0], init_b=[0.0, 0.0], relu_before_pool=False, g=None, expansion=False)
The parameters are much more complex than Dynamic ReLU. I hope you can tell me what these parameters represent in Dynamic Shift-Max. I understand inp and oup.
Also, like Dynamic ReLU, the number of parameters is 2KC.The number of parameters of Dynamic Shift-Max should be 2KCJ. Why is the number of parameters in your paper is KCJ?
Thank you!

liyunsheng13 · 2021-10-02T07:08:14Z

Oh sorry, my writing is incorrect. The expression of Dynamic ReLU is y=max(a1x1, b1x1). It just picks up the feature point with stronger activation.

For the meaning of the input parameters, unfortunately, they are about the implementation details such as initialization (init_a=[0.0, 0.0], init_b=[0.0, 0.0]) and it is hard for me to explain them. Besides, it has nothing to do with the understanding of Dynamic Shift-Max. I suggest you just to run the code step by step and you can get how the parameters influence the implementation easily.

For the number of parameters contained in Dynamic shift-max, since it considers channel shifting, it has to be implemented with moer parameters. For J=2, Dynamic Shift-max is max(a1x1+a2x2, b1x2+b2x2) with parametes a1, a2, b1 and b2 which doubles the parameters contained with dynamic relu (y=max(a1x1, b1x1))

FlyMoonSky · 2021-10-11T15:39:35Z

For M0, output channel of stem layer is 4 in the code, while it's 6 in the paper. I'm confused.

liyunsheng13 · 2021-10-11T17:52:09Z

There is no inconsistency for M0. You might read the old version of our paper.

FlyMoonSky · 2021-10-12T02:09:21Z

There is no inconsistency for M0. You might read the old version of our paper.

Thank you for your kind reply! It's really the problem of paper version. I have refered to the latest version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The consistency between the code and the description of the paper. #11

The consistency between the code and the description of the paper. #11

sang-yc commented Sep 29, 2021

liyunsheng13 commented Sep 29, 2021

sang-yc commented Sep 30, 2021

liyunsheng13 commented Sep 30, 2021

sang-yc commented Oct 2, 2021

liyunsheng13 commented Oct 2, 2021

FlyMoonSky commented Oct 11, 2021

liyunsheng13 commented Oct 11, 2021

FlyMoonSky commented Oct 12, 2021

The consistency between the code and the description of the paper. #11

The consistency between the code and the description of the paper. #11

Comments

sang-yc commented Sep 29, 2021

liyunsheng13 commented Sep 29, 2021

sang-yc commented Sep 30, 2021

liyunsheng13 commented Sep 30, 2021

sang-yc commented Oct 2, 2021

liyunsheng13 commented Oct 2, 2021

FlyMoonSky commented Oct 11, 2021

liyunsheng13 commented Oct 11, 2021

FlyMoonSky commented Oct 12, 2021