[Enhance] Add doc for ScanNet semantic segmentation data (#743)

* scannet sem seg data docs * minor fix * fix typo * fix typo in scannet_det
open-mmlab · Jul 21, 2021 · 1b2e64c · 1b2e64c
1 parent a006e48
commit 1b2e64c
Show file tree

Hide file tree

Showing 3 changed files with 142 additions and 10 deletions.
diff --git a/data/scannet/README.md b/data/scannet/README.md
@@ -27,7 +27,7 @@ The directory structure after pre-processing should be as below
 
 ```
 scannet
-├── scannet_utils.py
+├── meta_data
 ├── batch_load_scannet_data.py
 ├── load_scannet_data.py
 ├── scannet_utils.py

diff --git a/docs/datasets/scannet_det.md b/docs/datasets/scannet_det.md
@@ -6,7 +6,7 @@ For the overall process, please refer to the [README](https://github.com/open-mm
 
 ### Export ScanNet point cloud data
 
-By exporting ScanNet point cloud data, we load the raw point cloud data and generate the relevant annotations including semantic label, instance label and ground truth bounding boxes.
+By exporting ScanNet data, we load the raw point cloud data and generate the relevant annotations including semantic labels, instance labels and ground truth bounding boxes.
 
 ```shell
 python batch_load_scannet_data.py
@@ -32,7 +32,7 @@ mmdetection3d
 
 Under folder `scans` there are overall 1201 train and 312 validation folders in which raw point cloud data and relevant annotations are saved. For instance, under folder `scene0001_01` the files are as below:
 
-- `scene0001_01_vh_clean_2.ply`: Mesh file including raw point cloud data.
+- `scene0001_01_vh_clean_2.ply`: Mesh file storing coordinates and colors of each vertex. The mesh's vertices are taken as raw point cloud data.
 - `scene0001_01.aggregation.json`: Aggregation file including object id, segments id and label.
 - `scene0001_01_vh_clean_2.0.010000.segs.json`: Segmentation file including segments id and vertex.
 - `scene0001_01.txt`: Meta file including axis-aligned matrix, etc.
@@ -44,7 +44,7 @@ Export ScanNet data by running `python batch_load_scannet_data.py`. The main ste
 - Downsample raw point cloud and filter invalid classes.
 - Save point cloud data and relevant annotation files.
 
- And the core function `export` in `load_scannet_data.py` is as follows:
+And the core function `export` in `load_scannet_data.py` is as follows:
 
 ```python
 def export(mesh_file,
@@ -125,7 +125,7 @@ def export(mesh_file,
 
 ```
 
-After exporting each scan, the raw point cloud could be downsampled, e.g. to 50000, if the number of points is too large. In addition, invalid semantic labels outside of `nyu40id` standard or optional `DONOT CARE` classes should be filtered. Finally, the point cloud data, semantic labels, instance labels and ground truth bounding boxes should be saved in `.npy` files.
+After exporting each scan, the raw point cloud could be downsampled, e.g. to 50000, if the number of points is too large (the raw point cloud won't be downsampled if it's also used in 3d semantic segmentation task). In addition, invalid semantic labels outside of `nyu40id` standard or optional `DONOT CARE` classes should be filtered. Finally, the point cloud data, semantic labels, instance labels and ground truth bounding boxes should be saved in `.npy` files.
 
 ### Export ScanNet RGB data
 
@@ -192,7 +192,7 @@ The directory structure after process should be as below
 
 ```
 scannet
-├── scannet_utils.py
+├── meta_data
 ├── batch_load_scannet_data.py
 ├── load_scannet_data.py
 ├── scannet_utils.py
@@ -221,7 +221,7 @@ scannet
 ├── scannet_infos_test.pkl
 ```
 
-- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Note: the point would be axis-aligned in pre-processing `GlobalAlignment` of 3d detection task.
+- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task.
 - `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
 - `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
 - `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
@@ -243,9 +243,9 @@ scannet
         - annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
 
 
-## Train pipeline
+## Training pipeline
 
-A typical train pipeline of ScanNet for 3d detection is as below.
+A typical training pipeline of ScanNet for 3d detection is as below.
 
 ```python
 train_pipeline = [
@@ -287,8 +287,9 @@ train_pipeline = [
         ])
 ]
 ```
+
 - `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
-- `PointSegClassMapping`: Only the valid category id will be mapped to train class label id like [0, 18).
+- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training.
 - Data augmentation:
     - `IndoorPointSample`: downsample input point cloud.
     - `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
@@ -297,4 +298,5 @@ train_pipeline = [
 ## Metrics
 
 Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
+
 As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
diff --git a/docs/datasets/scannet_sem_seg.md b/docs/datasets/scannet_sem_seg.md
@@ -0,0 +1,130 @@
+# ScanNet for 3D Semantic Segmentation
+
+## Dataset preparation
+
+The overall process is similar to ScanNet 3d detection task. Please refer to this [section](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/datasets/scannet_det.md#dataset-preparation). Only a few differences and additional information about the 3d semantic segmentation data will be listed below.
+
+### Export ScanNet data
+
+Since ScanNet provides online benchmark for 3d semantic segmentation evaluation on the test set, we need to also download the test scans and put it under `scannet` folder.
+
+The directory structure before data preparation should be as below
+
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── scannet
+│   │   ├── meta_data
+│   │   ├── scans
+│   │   │   ├── scenexxxx_xx
+│   │   ├── scans_test
+│   │   │   ├── scenexxxx_xx
+│   │   ├── batch_load_scannet_data.py
+│   │   ├── load_scannet_data.py
+│   │   ├── scannet_utils.py
+│   │   ├── README.md
+```
+
+Under folder `scans_test` there are 100 test folders in which only raw point cloud data is saved. For instance, under folder `scene0707_00` the files are as below:
+
+- `scene0707_00_vh_clean_2.ply`: Mesh file storing coordinates and colors of each vertex. The mesh's vertices are taken as raw point cloud data.
+- `scene0707_00.txt`: Meta file including sensor parameters, etc. Note: different from data under `scans`, axis-aligned matrix is not provided for test scans.
+
+Export ScanNet data by running `python batch_load_scannet_data.py`. Note: only point cloud data will be saved for test set scans because no annotations are provided.
+
+### Create dataset
+
+The directory structure after process should be as below
+
+```
+scannet
+├── scannet_utils.py
+├── batch_load_scannet_data.py
+├── load_scannet_data.py
+├── scannet_utils.py
+├── README.md
+├── scans
+├── scans_test
+├── scannet_instance_data
+├── points
+│   ├── xxxxx.bin
+├── instance_mask
+│   ├── xxxxx.bin
+├── semantic_mask
+│   ├── xxxxx.bin
+├── seg_info
+│   ├── train_label_weight.npy
+│   ├── train_resampled_scene_idxs.npy
+│   ├── val_label_weight.npy
+│   ├── val_resampled_scene_idxs.npy
+├── scannet_infos_train.pkl
+├── scannet_infos_val.pkl
+├── scannet_infos_test.pkl
+```
+
+- `seg_info`: The generated infos to support semantic segmentation model training.
+    - `train_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance.
+    - `train_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points.
+
+## Training pipeline
+
+A typical training pipeline of ScanNet for 3d semantic segmentation is as below.
+
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='DEPTH',
+        shift_height=False,
+        use_color=True,
+        load_dim=6,
+        use_dim=[0, 1, 2, 3, 4, 5]),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=False,
+        with_label_3d=False,
+        with_mask_3d=False,
+        with_seg_3d=True),
+    dict(
+        type='PointSegClassMapping',
+        valid_cat_ids=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28,
+                       33, 34, 36, 39),
+        max_cat_id=40),
+    dict(
+        type='IndoorPatchPointSample',
+        num_points=num_points,
+        block_size=1.5,
+        ignore_index=len(class_names),
+        use_normalized_coord=False,
+        enlarge_size=0.2,
+        min_unique_num=None),
+    dict(type='NormalizePointsColor', color_mean=None),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='Collect3D', keys=['points', 'pts_semantic_mask'])
+]
+```
+
+- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 20) during training. Other class ids will be converted to `ignore_index` which equals to `20`.
+- `IndoorPatchPointSample`: Crop a patch containing a fixed number of points from input point cloud. `block_size` indicates the size of the cropped block, typically `1.5` for ScanNet.
+- `NormalizePointsColor`: Normalize the RGB color values of input point cloud by dividing `255`.
+
+## Metrics
+
+Typically mean intersection over union (mIoU) is used for evaluation on ScanNet. In detail, we first compute IoU for multiple classes and then average them to get mIoU, please refer to [seg_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/seg_eval.py).
+
+## Testing and Making a Submission
+
+By default, our codebase evaluates semantic segmentation results on the validation set.
+If you would like to test the model performance on the online benchmark, add `--format-only` flag in the evaluation script and change `ann_file=data_root + 'scannet_infos_val.pkl'` to `ann_file=data_root + 'scannet_infos_test.pkl'` in the ScanNet dataset's [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/scannet_seg-3d-20class.py#L126). Remember to specify the `txt_prefix` as the directory to save the testing results.
+Taking PointNet++ (SSG) on ScanNet for example, the following command can be used to do inference on test set:
+
+```
+./tools/dist_test.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class.py \
+    work_dirs/pointnet2_ssg/latest.pth --format-only \
+    --eval-options txt_prefix=work_dirs/pointnet2_ssg/test_submission
+```
+
+After generating the results, you can basically compress the folder and upload to the [ScanNet evaluation server](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d).