This folder contains RetinaNet and Mask R-CNN results on top of Detectron2.
🚀 All model are trained using ImageNet-1K pretrained weights.
☀️ MS denotes the same multi-scale training augmentation as in Swin-Transformer which follows the MS augmentation as in DETR and Sparse-RCNN. Therefore, we also follows the official implementation of DETR and Sparse-RCNN which are also based on Detectron2.
Backbone | Method | lr Schd | box mAP | mask mAP | #params | FLOPS | weight |
---|---|---|---|---|---|---|---|
MPViT-T | RetinaNet | 1x | 41.8 | - | 17M | 196G | model | metrics |
MPViT-XS | RetinaNet | 1x | 43.8 | - | 20M | 211G | model | metrics |
MPViT-S | RetinaNet | 1x | 45.7 | - | 32M | 248G | model | metrics |
MPViT-B | RetinaNet | 1x | 47.0 | - | 85M | 482G | model | metrics |
MPViT-T | RetinaNet | MS+3x | 44.4 | - | 17M | 196G | model | metrics |
MPViT-XS | RetinaNet | MS+3x | 46.1 | - | 20M | 211G | model | metrics |
MPViT-S | RetinaNet | MS+3x | 47.6 | - | 32M | 248G | model | metrics |
MPViT-B | RetinaNet | MS+3x | 48.3 | - | 85M | 482G | model | metrics |
MPViT-T | Mask R-CNN | 1x | 42.2 | 39.0 | 28M | 216G | model | metrics |
MPViT-XS | Mask R-CNN | 1x | 44.2 | 40.4 | 30M | 231G | model | metrics |
MPViT-S | Mask R-CNN | 1x | 46.4 | 42.4 | 43M | 268G | model | metrics |
MPViT-B | Mask R-CNN | 1x | 48.2 | 43.5 | 95M | 503G | model | metrics |
MPViT-T | Mask R-CNN | MS+3x | 44.8 | 41.0 | 28M | 216G | model | metrics |
MPViT-XS | Mask R-CNN | MS+3x | 46.6 | 42.3 | 30M | 231G | model | metrics |
MPViT-S | Mask R-CNN | MS+3x | 48.4 | 43.9 | 43M | 268G | model | metrics |
MPViT-B | Mask R-CNN | MS+3x | 49.5 | 44.5 | 95M | 503G | model | metrics |
We test all models using pytorch==1.7.0
detectron2==0.5
cuda==10.1
on NVIDIA V100 GPUs.
For the installation detectron2
library, please refer to the Detectron2's INSTALL.md.
# Install `detectron2`
python -m pip install detectron2==0.5 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
# Install `shapely`
conda install shapely
For the coco data preparation, please refer to the CoaT's guide.
The following commands provide an example, Retinanet / Mask R-CNN with MPViT backbone using a single GPU.
You can download the trained checkpoint by yourself and use the weight path for this command.
or
You can use the checkpoint link directly in this command.
cd MPViT/detectron2
python train_net.py --config-file <config-file> --eval-only --num-gpus <num_gpus> MODEL.WEIGHTS <checkpoint_file_path or link>
For RetinaNet with MPViT-Small
:
python train_net.py --config-file configs/retinanet/retinanet_mpvit_small_ms_3x.yaml --eval-only --num-gpus 1 MODEL.WEIGHTS https://dl.dropbox.com/s/gh00mdtqxoic64e/retinanet_mpvit_small_ms_3x.pth
This should give the following result:
Task: bbox
AP,AP50,AP75,APs,APm,APl
47.5802,68.7466,51.2814,32.0966,51.8934,61.1945
For Mask R-CNN with MPViT-Small
:
python train_net.py --config-file configs/maskrcnn/mask_rcnn_mpvit_small_ms_3x.yaml --eval-only --num-gpus 1 MODEL.WEIGHTS https://dl.dropbox.com/s/b0fohmjmggahnny/mask_rcnn_mpvit_small_ms_3x.pth
This should give the following result:
Task: bbox
AP,AP50,AP75,APs,APm,APl
48.4422,70.5305,52.5705,32.4423,51.5775,62.6640
Task: segm
AP,AP50,AP75,APs,APm,APl
43.9366,67.6408,47.5103,25.2489,46.4255,62.0025
The following command provides an example (Mask R-CNN, 8-GPU) to train the Mask R-CNN w/ MPViT backbone.
python train_net.py --config-file <config-file> --num-gpus <num_gpus>
For Mask R-CNN with MPViT-Small
:
python train_net.py --config-file configs/maskrcnn/mask_rcnn_mpvit_small_ms_3x.yaml --num-gpus 8
Detectron2's document may help you for more details.
Thanks to Detectron2 for the RetinaNet and Mask R-CNN implementation.