The official Pytorch implementation is available at https://github.com/nmaac/acon.
This repository contains the MegEngine implementation of the paper Activate or Not: Learning Customized Activation.
We propose a novel activation function we term the ACON that explicitly learns to activate the neurons or not. Below we show the ACON activation function and its first derivatives. β controls how fast the first derivative asymptotes to the upper/lower bounds, which are determined by p1 and p2.
We show the training curves of different activations here.
To show the effectiveness of the proposed acon family, we also provide an extreme simple toy funnel network (TFNet) made only by pointwise convolution and ACON-FReLU operators.
The following results are the ImageNet top-1 accuracy relative improvements compared with the ReLU baselines. The relative improvements of Meta-ACON are about twice as much as SENet.
The comparison between ReLU, Swish and ACON-C. We show improvements without additional amount of FLOPs and parameters:
Model | FLOPs | #Params. | top-1 err. (ReLU) | top-1 err. (Swish) | top-1 err. (ACON) |
---|---|---|---|---|---|
ShuffleNetV2 0.5x | 41M | 1.4M | 39.4 | 38.3 (+1.1) | 37.0 (+2.4) |
ShuffleNetV2 1.5x | 299M | 3.5M | 27.4 | 26.8 (+0.6) | 26.5 (+0.9) |
ResNet 50 | 3.9G | 25.5M | 24.0 | 23.5 (+0.5) | 23.2 (+0.8) |
ResNet 101 | 7.6G | 44.4M | 22.8 | 22.7 (+0.1) | 21.8 (+1.0) |
ResNet 152 | 11.3G | 60.0M | 22.3 | 22.2 (+0.1) | 21.2 (+1.1) |
Next, by adding a negligible amount of FLOPs and parameters, meta-ACON shows sigificant improvements:
Model | FLOPs | #Params. | top-1 err. |
---|---|---|---|
ShuffleNetV2 0.5x (meta-acon) | 41M | 1.7M | 34.8 (+4.6) |
ShuffleNetV2 1.5x (meta-acon) | 299M | 3.9M | 24.7 (+2.7) |
ResNet 50 (meta-acon) | 3.9G | 25.7M | 22.0 (+2.0) |
ResNet 101 (meta-acon) | 7.6G | 44.8M | 21.0 (+1.8) |
ResNet 152 (meta-acon) | 11.3G | 60.5M | 20.5 (+1.8) |
The simple TFNet without the SE modules can outperform the state-of-the art light-weight networks without the SE modules.
FLOPs | #Params. | top-1 err. | |
---|---|---|---|
MobileNetV2 0.17 | 42M | 1.4M | 52.6 |
ShuffleNetV2 0.5x | 41M | 1.4M | 39.4 |
TFNet 0.5 | 43M | 1.3M | 36.6 (+2.8) |
MobileNetV2 0.6 | 141M | 2.2M | 33.3 |
ShuffleNetV2 1.0x | 146M | 2.3M | 30.6 |
TFNet 1.0 | 135M | 1.9M | 29.7 (+0.9) |
MobileNetV2 1.0 | 300M | 3.4M | 28.0 |
ShuffleNetV2 1.5x | 299M | 3.5M | 27.4 |
TFNet 1.5 | 279M | 2.7M | 26.0 (+1.4) |
MobileNetV2 1.4 | 585M | 5.5M | 25.3 |
ShuffleNetV2 2.0x | 591M | 7.4M | 25.0 |
TFNet 2.0 | 474M | 3.8M | 24.3 (+0.7) |
Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh
Train:
python train.py --train-dir YOUR_TRAINDATASET_PATH --val-dir YOUR_VALDATASET_PATH
Eval:
python train.py --eval --eval-resume YOUR_WEIGHT_PATH --train-dir YOUR_TRAINDATASET_PATH --val-dir YOUR_VALDATASET_PATH
If you use these models in your research, please cite:
@inproceedings{ma2021activate,
title={Activate or Not: Learning Customized Activation},
author={Ma, Ningning and Zhang, Xiangyu and Liu, Ming and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}