-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The consistency between the code and the description of the paper. #11
Comments
All the models are consistent with the description in the paper. Using M0 as an example, if you take a look at Table 1 in the paper, you can find the hidden dimension C/R is {8,12,16,32,64,96}, which is exactly the same as the config file used for M0. |
First of all, thank you for your reply! |
Like you mentioned, when J=1, Dynamic Shift-Max is just Dynamic ReLU with the expression like y=max(a1x1+b1x1) (x1 is the first channel of the feature map, a1,b1 are the dynamic coefficients). When J=2, the output y will become max(a1x1+a2x2, b1x2+b2x2), where channel x1 and x2 are fused. This is the key difference compared to Dynamic ReLU. In our implementation, actually, we found x2 should be achieved with group shift instead of channel shift, thus it is x_{jC/G}. So the value of J has nothing to do with the group number, it just depends on how many channels you want to fuse and of course J<=G. |
Thank you for your reply! |
Oh sorry, my writing is incorrect. The expression of Dynamic ReLU is y=max(a1x1, b1x1). It just picks up the feature point with stronger activation. For the meaning of the input parameters, unfortunately, they are about the implementation details such as initialization (init_a=[0.0, 0.0], init_b=[0.0, 0.0]) and it is hard for me to explain them. Besides, it has nothing to do with the understanding of Dynamic Shift-Max. I suggest you just to run the code step by step and you can get how the parameters influence the implementation easily. For the number of parameters contained in Dynamic shift-max, since it considers channel shifting, it has to be implemented with moer parameters. For J=2, Dynamic Shift-max is max(a1x1+a2x2, b1x2+b2x2) with parametes a1, a2, b1 and b2 which doubles the parameters contained with dynamic relu (y=max(a1x1, b1x1)) |
For M0, output channel of stem layer is 4 in the code, while it's 6 in the paper. I'm confused. |
There is no inconsistency for M0. You might read the old version of our paper. |
Thank you for your kind reply! It's really the problem of paper version. I have refered to the latest version. |
Hello, I found that the M0 of the code and the M0 of the paper are not the same structure. I would like to ask whether the code of Micro-Block-A, Micro-Block-B, and Micro-Block-C is consistent with the description of the paper and whether there is any difference?
Thank you.
The text was updated successfully, but these errors were encountered: