This train_net.py
script reproduces MGD experiment of using RetinaNet-R50 to distill RetinaNet-R18 on COCO.
The results in paper were produced from the code written with maskrcnn-benchmark, which has been deprecated. This is a re-implemented code with using Detectron2. The improvement margin brought by MGD here is slightly larger than that in paper.
- Detectron2 Tree - 369a57d333 was used for this paper.
- Install Detectron2 in this directory
./detectron2
Since there is no ResNet-18 pretrained model released by MSRA, we use torchvision model here for model initialization. Download torchvision resnet models and then use detectron2/tools/convert-torchvision-to-d2.py to convert them into detectron2's format.
#!/usr/bin/env bash
# ResNet-50
python detectron2/tools/convert-torchvision-to-d2.py \
resnet50-19c8e357.pth \
r-50.pkl
# ResNet-18
python detectron2/tools/convert-torchvision-to-d2.py \
resnet18-5c106cde.pth \
r-18.pkl
To train the base RetinaNet-R50/R-18 and run
#!/usr/bin/env bash
DEPTH=50 # or 18
cp configs/retinanet_torchvision_R_${DEPTH}_FPN_1x.yaml detectron2/configs/COCO-Detection/
cd detectron2
python tools/train_net.py \
--num-gpus 8 \
--config-file configs/COCO-Detection/retinanet_torchvision_R_${DEPTH}_FPN_1x.yaml \
MODEL.WEIGHTS r-${DEPTH}.pkl
model | lr sched | AP50 | AP | AP75 | model id | download |
---|---|---|---|---|---|---|
RetinaNet-R50 | 1x | 56.03 | 36.70 | 39.04 | 9db3bf173 | Google Drive Baidu Pan [code: kkap ] |
RetinaNet-R18 | 1x | 49.84 | 31.81 | 33.60 | - | - |
Note:
- The torchvision pretrained models will produce slightly worse results than MSRA models. Refer to the comments.
We put two pretrained models together and save only one pkl file in detectron2's format for MGD training.
These two pretrained models are:
- RetinaNet-R50 model produced by the step-2 of base training. Its model id is 9db3bf173. It can be downloaded from the above results table.
- ResNet-18 model from torchvision, please follow the step-1 of base training to download and convert it into
r-18.pkl
.
To merge teacher model_9db3bf173.pth
and student r-18.pkl
into one file T50-S18.pkl
, run
#!/usr/bin/env bash
python scripts/merge-models-into-one.py \
[path/to/retinanet-r50/base/model_9db3bf173.pth] \
r-18.pkl \
T50-S18.pkl
To distill RetinaNet-R18 by RetinaNet-R50 with MGD, run the following script
#!/usr/bin/env bash
# copy necessary files into detectron2
cp configs/* detectron2/configs/COCO-Detection/
cp d2/modeling/backbone/* detectron2/detectron2/modeling/backbone/
cp d2/modeling/meta_arch/* detectron2/detectron2/modeling/meta_arch/
cp d2/engine/* detectron2/detectron2/engine/
cp ../mgd/builder.py mgd/mgd.py
# start training
python train_net.py \
--num-gpus 8 \
--config-file detectron2/configs/Base-RetinaNet.yaml \
MODEL.WEIGHTS T50-S18.pkl
We need to split out the student model from checkpoint, for example the final saved checkpoint ./output/model_final.pth
.
Run the following script:
#!/usr/bin/env bash
python scripts/convert-output-to-d2.py \
./output/model_final.pth \
./output/model_final.pkl \
# --eval-teacher # uncomment this line if you would like to check teacher performance
If you would like to convert once for all the checkpoints, this script may help:
Script for processing all checkpoints
#!/usr/bin/env bash
OUTPUT_DIR=output
for out in $(ls $OUTPUT_DIR); do
if [[ $out == *"pth"* ]]; then
old=$out
new="${out/pth/pkl}"
echo $old "->" $new
python scripts/convert-output-to-d2.py \
${OUTPUT_DIR}/$old \
${OUTPUT_DIR}/$new \
# --eval-teacher # uncomment this line if you would like to check teacher performance
fi
done
To do evaluation for student model and run
#!/usr/bin/env bash
# copy necessary files into detectron2
cp configs/* detectron2/configs/COCO-Detection/
cp d2/modeling/backbone/* detectron2/detectron2/modeling/backbone/
cp d2/modeling/meta_arch/* detectron2/detectron2/modeling/meta_arch/
cp d2/engine/* detectron2/detectron2/engine/
cp ../mgd/builder.py mgd/mgd.py
# evaluate student model distilled by MGD
python train_net.py \
--num-gpus 8 \
--eval-only \
--config-file detectron2/configs/Base-RetinaNet.yaml \
MODEL.WEIGHTS ./output/model_final.pkl
model | method | lr sched | AP50 | AP | AP75 | model id | download |
---|---|---|---|---|---|---|---|
RetinaNet-R50 | Teacher | 1x | 56.03 | 36.70 | 39.04 | 9db3bf173 | Google Drive Baidu Pan [code: kkap ] |
RetinaNet-R18 | Student | 1x | 49.84 | 31.81 | 33.60 | - | - |
mgd/config.py | MGD - AMP | 1x | 51.09 | 32.47 | 34.29 | 73c2534e9 | Google Drive Baidu Pan [code: n2te ] |
The above figure demonstrates distillation positions for the detection model with FPN architecture.
In FPN, the intermediate output features have their own specific names, as shown in figure, which are C2, C3, C4, C5
in bottom-up pathway and P5, P4, P3
in top-down pathway.
Under the same experimental setting with classification task, we use C2, C3, C4, C5
from the resnet backbone.
This is why we set cfg.MODEL.BACKBONE.FREEZE_AT = 1
here for student model.
Using default value may produce worse results.
Here is the comparison.
model | method | FREEZE_AT | cfg.MGD.IGNORE_INDS | AP |
---|---|---|---|---|
RetinaNet-R18 | MGD - AMP | 1 | [4, 5, 6, 7, 8] | 32.47 |
2 | [0, 4, 5, 6, 7, 8] | 32.28 |
We enable norm layers to be trained in the last block of each stage in resnet backbone for student. Since we use the feature maps before ReLU to distill student and calculate the loss margins using the affine parameters of teacher norm layers, we want to let student not only learns better feature maps from teacher, but also needs to learn the better batched statistics and affine parameters from teacher norm layers.
In default setting, we use torch.nn.GroupNorm
for MGD training due to the small batch size.
If one would like to use Sync version norm layer, please set cfg.MGD.SYNC_BN = True
.
Same as classification, this is only for experimental and research purpose.
Since the spatial shape changes during training, the loss factors may be a little tricky to set.
We follow a simple rule to choose them - make each stage distillation loss value into 0.0x
after dividing its loss factor and batch size at the beginning of training.
Hope this tip may be helpful in your own project task.
If using PyTorch 1.4 to run the Detectron2, you may encounter the problem of
RuntimeError:
No such operator torchvision::nms
According to its related issue1 and issue2, there are two solutions:
- Upgrade your PyTorch and torchvision together to avoid using
torchvision==0.5.0
. - Manually
pip uninstall torchvision
and download source code from releases/tag/v0.5.0 and then compile it by yourself.