add GN documentation

Summary: Add GN documentation. Reviewed By: rbgirshick, ppwwyyxx Differential Revision: D7745878 fbshipit-source-id: 8a49160b1e4026fbcf41082df4a3a8d0f8e90d85
facebookresearch · Apr 24, 2018 · 0dbea62 · 0dbea62
1 parent c7692eb
commit 0dbea62
Show file tree

Hide file tree

Showing 3 changed files with 269 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework.
 
-At FAIR, Detectron has enabled numerous research projects, including: [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333), [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), [Non-local Neural Networks](https://arxiv.org/abs/1711.07971), [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370), [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440), and [DensePose: Dense Human Pose Estimation In The Wild](https://arxiv.org/abs/1802.00434).
+At FAIR, Detectron has enabled numerous research projects, including: [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333), [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), [Non-local Neural Networks](https://arxiv.org/abs/1711.07971), [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370), [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440), [DensePose: Dense Human Pose Estimation In The Wild](https://arxiv.org/abs/1802.00434), and [Group Normalization](https://arxiv.org/abs/1803.08494).
 
 <div align="center">
   <img src="demo/output/33823288584_1d21cf0a26_k_example_output.jpg" width="700px" />
@@ -32,6 +32,10 @@ using the following backbone network architectures:
 
 Additional backbone architectures may be easily implemented. For more details about these models, please see [References](#references) below.
 
+## Update
+
+- 4/2018: Support Group Normalization - see [`gn/README.md`](./projects/gn/README.md)
+
 ## License
 
 Detectron is released under the [Apache 2.0 license](https://github.com/facebookresearch/detectron/blob/master/LICENSE). See the [NOTICE](https://github.com/facebookresearch/detectron/blob/master/NOTICE) file for additional details.

diff --git a/projects/GN/README.md b/projects/GN/README.md
@@ -0,0 +1,264 @@
+# Group Normalization for Mask R-CNN
+
+<div align="center">
+  <img src="gn.jpg" width="700px" />
+</div>
+
+## Introduction
+
+This file provides Mask R-CNN baseline results and models trained with [Group Normalization](https://arxiv.org/abs/1803.08494):
+
+```
+@article{GroupNorm2018,
+  title={Group Normalization},
+  author={Yuxin Wu and Kaiming He},
+  journal={arXiv:1803.08494},
+  year={2018}
+}
+```
+
+**Note:** This code uses the GroupNorm op implemented in CUDA, included in the Caffe2 repo. When writing this document, Caffe2 is being merged into PyTorch, and the GroupNorm op is located [here](https://github.com/pytorch/pytorch/blob/master/caffe2/operators/group_norm_op.cu). Make sure your Caffe2 is up to date.
+
+## Pretrained Models with GN
+
+These models are trained in Caffe2 on the standard ImageNet-1k dataset, using GroupNorm with 32 groups (G=32).
+
+- [R-50-GN.pkl](https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/47261647/R-50-GN.pkl): ResNet-50 with GN, 24.0\% top-1 error (center-crop).
+- [R-101-GN.pkl](https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/47592356/R-101-GN.pkl): ResNet-101 with GN, 22.6\% top-1 error (center-crop).
+
+## Results
+
+### Baselines with BN
+
+<table><tbody>
+<!-- START E2E MASK RCNN BN TABLE -->
+<!-- TABLE HEADER -->
+<!-- Info: we use wrap text in <sup><sub></sub><sup> to make is small -->
+<th valign="bottom"><sup><sub>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</sub></sup></th>
+<th valign="bottom"><sup><sub>type</sub></sup></th>
+<th valign="bottom"><sup><sub>lr<br/>schd</sub></sup></th>
+<th valign="bottom"><sup><sub>im/<br/>gpu</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>mem<br/>(GB)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>(s/iter)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>total<br/>(hr)</sub></sup></th>
+<th valign="bottom"><sup><sub>inference<br/>time<br/>(s/im)</sub></sup></th>
+<th valign="bottom"><sup><sub>box<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>mask<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>model id</sub></sup></th>
+<tr>
+<td align="left"><sup><sub>R-50-FPN, BN*</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub>2x</sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>8.6</sub></sup></td>
+<td align="right"><sup><sub>0.897</sub></sup></td>
+<td align="right"><sup><sub>44.9</sub></sup></td>
+<td align="right"><sup><sub>0.099&nbsp;+&nbsp;0.018</sub></sup></td>
+<td align="right"><sup><sub>38.6</sub></sup></td>
+<td align="right"><sup><sub>34.5</sub></sup></td>
+<td align="right"><sup><sub>35859007</sub></sup></td>
+</tr>
+<tr>
+<td align="left"><sup><sub>R-101-FPN, BN*</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub>2x</sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>10.2</sub></sup></td>
+<td align="right"><sup><sub>0.993</sub></sup></td>
+<td align="right"><sup><sub>49.7</sub></sup></td>
+<td align="right"><sup><sub>0.126&nbsp;+&nbsp;0.017</sub></sup></td>
+<td align="right"><sup><sub>40.9</sub></sup></td>
+<td align="right"><sup><sub>36.4</sub></sup></td>
+<td align="right"><sup><sub>35861858</sub></sup></td>
+</tr>
+<!-- END E2E MASK RCNN BN TABLE -->
+</tbody></table>
+
+**Notes:**
+
+- This table is copied from [Detectron Model Zoo](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#end-to-end-faster--mask-r-cnn-baselines).
+- BN<sup>*</sup> means that BatchNorm (BN) is used for pre-training and is frozen and turned into a per-channel linear layer when fine-tuning. This is the default of Faster/Mask R-CNN and Detectron.
+
+### Mask R-CNN with GN
+
+#### Standard Mask R-CNN recipe
+<table><tbody>
+<!-- START E2E MASK RCNN GN TABLE -->
+<!-- TABLE HEADER -->
+<!-- Info: we use wrap text in <sup><sub></sub><sup> to make is small -->
+<th valign="bottom"><sup><sub>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</sub></sup></th>
+<th valign="bottom"><sup><sub>type</sub></sup></th>
+<th valign="bottom"><sup><sub>lr<br/>schd</sub></sup></th>
+<th valign="bottom"><sup><sub>im/<br/>gpu</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>mem<br/>(GB)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>(s/iter)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>total<br/>(hr)</sub></sup></th>
+<th valign="bottom"><sup><sub>inference<br/>time<br/>(s/im)</sub></sup></th>
+<th valign="bottom"><sup><sub>box<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>mask<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>model id</sub></sup></th>
+<th valign="bottom"><sup><sub>download<br/>links</sub></sup></th>
+<!-- TABLE BODY -->
+<tr>
+<td align="left"><sup><sub>R-50-FPN, GN</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub>2x</sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>10.5</sub></sup></td>
+<td align="right"><sup><sub>1.017</sub></sup></td>
+<td align="right"><sup><sub>50.8</sub></sup></td>
+<td align="right"><sup><sub>0.146&nbsp;+&nbsp;0.017</sub></sup></td>
+<td align="right"><sup><sub>40.3</sub></sup></td>
+<td align="right"><sup><sub>35.7</sub></sup></td>
+<td align="right"><sup><sub>48616381</sub></sup></td>
+<td align="left"><sup><sub>
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48616381/04_2018_gn_baselines/e2e_mask_rcnn_R-50-FPN_2x_gn_0416.13_23_38.bTlTI97Q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl">model</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48616381/04_2018_gn_baselines/e2e_mask_rcnn_R-50-FPN_2x_gn_0416.13_23_38.bTlTI97Q/output/test/coco_2014_minival/generalized_rcnn/bbox_coco_2014_minival_results.json">boxes</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48616381/04_2018_gn_baselines/e2e_mask_rcnn_R-50-FPN_2x_gn_0416.13_23_38.bTlTI97Q/output/test/coco_2014_minival/generalized_rcnn/segmentations_coco_2014_minival_results.json">masks</a></sub></sup></td>
+</tr>
+<tr>
+<td align="left"><sup><sub>R-101-FPN, GN</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub>2x</sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>12.4</sub></sup></td>
+<td align="right"><sup><sub>1.151</sub></sup></td>
+<td align="right"><sup><sub>57.5</sub></sup></td>
+<td align="right"><sup><sub>0.180&nbsp;+&nbsp;0.015</sub></sup></td>
+<td align="right"><sup><sub>41.8</sub></sup></td>
+<td align="right"><sup><sub>36.8</sub></sup></td>
+<td align="right"><sup><sub>48616724</sub></sup></td>
+<td align="left"><sup><sub>
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48616724/04_2018_gn_baselines/e2e_mask_rcnn_R-101-FPN_2x_gn_0416.13_26_34.GLnri4GR/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl">model</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48616724/04_2018_gn_baselines/e2e_mask_rcnn_R-101-FPN_2x_gn_0416.13_26_34.GLnri4GR/output/test/coco_2014_minival/generalized_rcnn/bbox_coco_2014_minival_results.json">boxes</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48616724/04_2018_gn_baselines/e2e_mask_rcnn_R-101-FPN_2x_gn_0416.13_26_34.GLnri4GR/output/test/coco_2014_minival/generalized_rcnn/segmentations_coco_2014_minival_results.json">masks</a></sub></sup></td>
+</tr>
+<!-- END E2E MASK RCNN GN TABLE -->
+</tbody></table>
+
+**Notes:**
+- GN is applied on: (i) ResNet layers inherited from pre-training, (ii) the FPN-specific layers, (iii) the RoI bbox head, and (iv) the RoI mask head.
+- These GN models use a 4conv+1fc RoI box head. The BN<sup>*</sup> counterpart with this head performs similarly with the default 2fc head: using this codebase, R-50-FPN BN<sup>\*</sup> with 4conv+1fc has 38.8/34.4 box/mask AP.
+- 2x is the default schedule (180k) in Detectron.
+
+#### Longer training schedule
+<table><tbody>
+<!-- START E2E MASK RCNN GN 3X TABLE -->
+<!-- TABLE HEADER -->
+<!-- Info: we use wrap text in <sup><sub></sub><sup> to make is small -->
+<th valign="bottom"><sup><sub>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</sub></sup></th>
+<th valign="bottom"><sup><sub>type</sub></sup></th>
+<th valign="bottom"><sup><sub>lr<br/>schd</sub></sup></th>
+<th valign="bottom"><sup><sub>im/<br/>gpu</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>mem<br/>(GB)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>(s/iter)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>total<br/>(hr)</sub></sup></th>
+<th valign="bottom"><sup><sub>inference<br/>time<br/>(s/im)</sub></sup></th>
+<th valign="bottom"><sup><sub>box<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>mask<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>model id</sub></sup></th>
+<th valign="bottom"><sup><sub>download<br/>links</sub></sup></th>
+<!-- TABLE BODY -->
+<tr>
+<td align="left"><sup><sub>R-50-FPN, GN</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub><b>3x</b></sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>10.5</sub></sup></td>
+<td align="right"><sup><sub>1.033</sub></sup></td>
+<td align="right"><sup><sub>77.4</sub></sup></td>
+<td align="right"><sup><sub>0.145&nbsp;+&nbsp;0.015</sub></sup></td>
+<td align="right"><sup><sub>40.8</sub></sup></td>
+<td align="right"><sup><sub>36.1</sub></sup></td>
+<td align="right"><sup><sub>48734751</sub></sup></td>
+<td align="left"><sup><sub>
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48734751/04_2018_gn_baselines/e2e_mask_rcnn_R-50-FPN_3x_gn_0417.09_54_59.nwCTtPVk/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl">model</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48734751/04_2018_gn_baselines/e2e_mask_rcnn_R-50-FPN_3x_gn_0417.09_54_59.nwCTtPVk/output/test/coco_2014_minival/generalized_rcnn/bbox_coco_2014_minival_results.json">boxes</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48734751/04_2018_gn_baselines/e2e_mask_rcnn_R-50-FPN_3x_gn_0417.09_54_59.nwCTtPVk/output/test/coco_2014_minival/generalized_rcnn/segmentations_coco_2014_minival_results.json">masks</a></sub></sup></td>
+</tr>
+<tr>
+<td align="left"><sup><sub>R-101-FPN, GN</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub><b>3x</b></sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>12.4</sub></sup></td>
+<td align="right"><sup><sub>1.171</sub></sup></td>
+<td align="right"><sup><sub>87.9</sub></sup></td>
+<td align="right"><sup><sub>0.180&nbsp;+&nbsp;0.014</sub></sup></td>
+<td align="right"><sup><sub>42.3</sub></sup></td>
+<td align="right"><sup><sub>37.2</sub></sup></td>
+<td align="right"><sup><sub>48734779</sub></sup></td>
+<td align="left"><sup><sub>
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48734779/04_2018_gn_baselines/e2e_mask_rcnn_R-101-FPN_3x_gn_0417.09_55_23.HMtcR8wg/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl">model</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48734779/04_2018_gn_baselines/e2e_mask_rcnn_R-101-FPN_3x_gn_0417.09_55_23.HMtcR8wg/output/test/coco_2014_minival/generalized_rcnn/bbox_coco_2014_minival_results.json">boxes</a>
+  &nbsp;|&nbsp;
+  <a href="https://s3-us-west-2.amazonaws.com/detectron/GN/48734779/04_2018_gn_baselines/e2e_mask_rcnn_R-101-FPN_3x_gn_0417.09_55_23.HMtcR8wg/output/test/coco_2014_minival/generalized_rcnn/segmentations_coco_2014_minival_results.json">masks</a></sub></sup></td>
+</tr>
+<!-- END E2E MASK RCNN GN 3X TABLE -->
+</tbody></table>
+
+**Notes:**
+- 3x is a longer schedule (270k). GN can improve further when using the longer schedule, but its BN<sup>*</sup> counterpart remains similar (R-50-FPN BN<sup>\*</sup>: 38.9/34.3) with the longer schedule.
+- These models are **without** any scale augmentation that can further [improve results](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#mask-r-cnn-with-bells--whistles).
+
+
+### Explorations
+
+#### Training Mask R-CNN from scratch
+
+GN enables to train Mask R-CNN *from scratch* without ImageNet pre-training, despite the small batch size.
+
+<table><tbody>
+<!-- START E2E MASK RCNN GN SCRATCH TABLE -->
+<!-- TABLE HEADER -->
+<!-- Info: we use wrap text in <sup><sub></sub><sup> to make is small -->
+<th valign="bottom"><sup><sub>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</sub></sup></th>
+<th valign="bottom"><sup><sub>type</sub></sup></th>
+<th valign="bottom"><sup><sub>lr<br/>schd</sub></sup></th>
+<th valign="bottom"><sup><sub>im/<br/>gpu</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>mem<br/>(GB)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>(s/iter)</sub></sup></th>
+<th valign="bottom"><sup><sub>train<br/>time<br/>total<br/>(hr)</sub></sup></th>
+<th valign="bottom"><sup><sub>inference<br/>time<br/>(s/im)</sub></sup></th>
+<th valign="bottom"><sup><sub>box<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>mask<br/>AP</sub></sup></th>
+<th valign="bottom"><sup><sub>model id</sub></sup></th>
+<!-- TABLE BODY -->
+<tr>
+<td align="left"><sup><sub>R-50-FPN, GN, scratch</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub>3x</sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>10.5</sub></sup></td>
+<td align="right"><sup><sub>0.990</sub></sup></td>
+<td align="right"><sup><sub>74.3</sub></sup></td>
+<td align="right"><sup><sub>0.146&nbsp;+&nbsp;0.020</sub></sup></td>
+<td align="right"><sup><sub>36.2</sub></sup></td>
+<td align="right"><sup><sub>32.5</sub></sup></td>
+<td align="right"><sup><sub>49025460</sub></sup></td>
+</tr>
+<tr>
+<td align="left"><sup><sub>R-101-FPN, GN, scratch</sub></sup></td>
+<td align="left"><sup><sub>Mask R-CNN</sub></sup></td>
+<td align="left"><sup><sub>3x</sub></sup></td>
+<td align="right"><sup><sub>2</sub></sup></td>
+<td align="right"><sup><sub>12.4</sub></sup></td>
+<td align="right"><sup><sub>1.124</sub></sup></td>
+<td align="right"><sup><sub>84.3</sub></sup></td>
+<td align="right"><sup><sub>0.180&nbsp;+&nbsp;0.019</sub></sup></td>
+<td align="right"><sup><sub>37.5</sub></sup></td>
+<td align="right"><sup><sub>33.3</sub></sup></td>
+<td align="right"><sup><sub>49024951</sub></sup></td>
+</tr>
+<!-- END E2E MASK RCNN GN SCRATCH TABLE -->
+</tbody></table>
+
+**Notes:**
+- To reproduce these results, see the config yaml files starting with ```scratch ```.
diff --git a/projects/GN/gn.jpg b/projects/GN/gn.jpg