NEWS:🔥3D-GRES is accepted at ACM MM 2024 (Oral)!🔥
Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji
3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions.In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations.MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension.
Requirements
- Python 3.7 or higher
- Pytorch 1.12
- CUDA 11.3 or higher
The following installation suppose python=3.8
pytorch=1.12.1
and cuda=11.3
.
-
Create a conda virtual environment
conda create -n 3d-gres python=3.8 conda activate 3d-gres
-
Clone the repository
git clone https://github.com/sosppxo/MDIN.git
-
Install the dependencies
Install Pytorch 1.12.1
pip install spconv-cu113 conda install pytorch-scatter -c pyg # or pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl pip install -r requirements.txt
Install segmentator from this repo (We wrap the segmentator in ScanNet).
-
Setup, Install mdin and pointgroup_ops.
sudo apt-get install libsparsehash-dev python setup.py develop cd gres_model/lib/ python setup.py develop
-
Compile pointnet++
cd pointnet2
python setup.py install --user
cd ..
Download the ScanNet v2 dataset.
Put the downloaded scans
folder as follows.
MDIN
├── data
│ ├── scannetv2
│ │ ├── scans
Split and preprocess point cloud data
cd data/scannetv2
bash prepare_data.sh
The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.
MDIN
├── data
│ ├── scannetv2
│ │ ├── scans
│ │ ├── train
│ │ ├── val
Download ScanRefer annotations following the instructions.
In the original ScanRefer annotations, all ann_id
within each scene were individually assigned based on the corresponding object_id
, resulting in duplicate ann_id
. We have modified the ScanRefer annotations, and the revised annotation data, where each ann_id
within a scene is unique, can be accessed here.
Put the downloaded ScanRefer
folder as follows.
MDIN
├── data
│ ├── ScanRefer
│ │ ├── ScanRefer_filtered_train_new.json
│ │ ├── ScanRefer_filtered_val_new.json
Downloading the Multi3DRefer annotations.
Put the downloaded Multi3DRefer
folder as follows.
MDIN
├── data
│ ├── Multi3DRefer
│ │ ├── multi3drefer_train.json
│ │ ├── multi3drefer_val.json
There are some typos in the original text, please correct them according to Issue #6 to prevent syntax parsing errors.
Or download the modified Multi3DRefer(New)
Downloading the ReferIt3D annotations and convert the .csv
file into a .json
format consistent with the Multi3DRefer format.
Put the downloaded ReferIt3D
folder as follows.
MDIN
├── data
│ ├── ReferIt3D
│ │ ├── sr3d_train.json
│ │ ├── sr3d_val.json
│ │ ├── nr3d_train.json
│ │ ├── nr3d_val.json
Or download the modified ReferIt3D(.json)
Download SPFormer pretrained model (We only use the Sparse 3D U-Net backbone for training).
Move the pretrained model to backbones.
mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/
Download pretrain models and move it to checkpoints.
Benchmark | Task | mIoU | [email protected] | [email protected] | Model |
---|---|---|---|---|---|
Multi3DRes | 3D-GRES | 47.5 | 66.9 | 44.7 | Model |
ScanRefer | 3D-RES | 48.3 | 58.0 | 53.1 | Model |
Nr3D | 3D-RES | 38.6 | 48.4 | 42.2 | Model |
Sr3D | 3D-RES | 46.4 | 56.6 | 51.3 | Model |
For 3D-GRES:
bash scripts/train_3dgres.sh
For 3D-RES:
bash scripts/train_3dres.sh
For 3D-GRES:
bash scripts/test_3dgres.sh
For 3D-RES:
bash scripts/test_3dres.sh
If you find this work useful in your research, please cite:
@misc{wu20243dgresgeneralized3dreferring,
title={3D-GRES: Generalized 3D Referring Expression Segmentation},
author={Changli Wu and Yihang Liu and Jiayi Ji and Yiwei Ma and Haowei Wang and Gen Luo and Henghui Ding and Xiaoshuai Sun and Rongrong Ji},
year={2024},
eprint={2407.20664},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.20664},
}
Sincerely thanks for ReLA, M3DRef-CLIP, EDA, SceneGraphParser, SoftGroup, SSTNet and SPFormer repos. This repo is build upon them.