Skip to content

[ECCV 2024] 4D Contrastive Superflows are Dense 3D Representation Learners

License

Notifications You must be signed in to change notification settings

Xiangxu-0103/SuperFlow

Repository files navigation

English | 简体中文

4D Contrastive Superflows are Dense 3D Representation Learners

Xiang Xu1,*,    Lingdong Kong2,3,*,    Hui Shuai4,    Wenwei Zhang2,   
Liang Pan2,    Kai Chen2,    Ziwei Liu5,    Qingshan Liu4
1Nanjing University of Aeronautics and Astronautics    2Shanghai AI Laboratory    3National University of Singapore    4Nanjing University of Posts and Telecommunications    5S-Lab, Nanyang Technological University   

About

SuperFlow is introduced to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. It stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations.

Updates

  • [2024.07] - Our paper is accepted by ECCV.

Outline

⚙️ Installation

For details related to installation and environment setups, kindly refer to INSTALL.md.

♨️ Data Preparation

Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.

🚀 Getting Started

To learn more usage about this codebase, kindly refer to GET_STARTED.md.

📊 Main Results

Comparisons of state-of-the-art pretraining methods

Method Distill nuScenes KITTI Waymo
LP 1% 5% 10% 25% Full 1% 1%
Random - 8.10 30.30 47.84 56.15 65.48 74.66 39.50 39.41
PPKT ViT-S 38.60 40.60 52.06 59.99 65.76 73.97 43.25 47.44
SLiDR ViT-S 44.70 41.16 53.65 61.47 66.71 74.20 44.67 47.57
Seal ViT-S 45.16 44.27 55.13 62.46 67.64 75.58 46.51 48.67
SuperFlow ViT-S 46.44 47.81 59.44 64.47 69.20 76.54 47.97 49.94
PPKT ViT-B 39.95 40.91 53.21 60.87 66.22 74.07 44.09 47.57
SLiDR ViT-B 45.35 41.64 55.83 62.68 67.61 74.98 45.50 48.32
Seal ViT-B 46.59 45.98 57.15 62.79 68.18 75.41 47.24 48.91
SuperFlow ViT-S 47.66 48.09 59.66 64.52 69.79 76.57 48.40 50.20
PPKT ViT-L 41.57 42.05 55.75 61.26 66.88 74.33 45.87 47.82
SLiDR ViT-L 45.70 42.77 57.45 63.20 68.13 75.51 47.01 48.60
Seal ViT-L 46.81 46.27 58.14 63.27 68.67 75.66 47.55 50.02
SuperFlow ViT-L 48.01 49.95 60.72 65.09 70.01 77.19 49.07 50.67

Domain generalization study

Method ScriKITTI Rellis-3D SemPOSS SemSTF SynLiDAR DAPS-3D Synth4D
1% 10% 1% 10% Half Full Half Full 1% 10% Half Full 1% 10%
Random 23.81 47.60 38.46 53.60 46.26 54.12 48.03 48.15 19.89 44.74 74.32 79.38 20.22 66.87
PPKT 36.50 51.67 49.71 54.33 50.18 56.00 50.92 54.69 37.57 46.48 78.90 84.00 61.10 62.41
SLiDR 39.60 50.45 49.75 54.57 51.56 55.36 52.01 54.35 42.05 47.84 81.00 85.40 63.10 62.67
Seal 40.64 52.77 51.09 55.03 53.26 56.89 53.46 55.36 43.58 49.26 81.88 85.90 64.50 66.96
SuperFlow 42.70 54.00 52.83 55.71 54.41 57.33 54.72 56.57 44.85 51.38 82.43 86.21 65.31 69.43

Out-of-distribution 3D robustness study

# Initial Backbone mCE mRR Fog Rain Snow Blur Beam Cross Echo Sensor Avg
Full Random MinkU-18 115.61 70.85 53.90 71.10 48.22 51.85 62.21 37.73 57.47 38.97 52.68
SuperFlow MinkU-18 109.00 75.66 54.95 72.79 49.56 57.68 62.82 42.45 59.61 41.77 55.21
Random MinkU-34 112.20 72.57 62.96 70.65 55.48 51.71 62.01 31.56 59.64 39.41 54.18
SuperFlow MinkU-34 91.67 83.17 70.32 75.77 65.41 61.05 68.09 60.02 58.36 50.41 63.68
Random MinkU-50 113.76 72.81 49.95 71.16 45.36 55.55 62.84 36.94 59.12 43.15 53.01
SuperFlow MinkU-50 107.35 74.02 54.36 73.08 50.07 56.92 64.05 38.10 62.02 47.02 55.70
Random MinkU-101 109.10 74.07 50.45 73.02 48.85 58.48 64.18 43.86 59.82 41.47 55.02
SuperFlow MinkU-101 96.44 78.57 56.92 76.29 54.70 59.35 71.89 55.13 60.27 51.60 60.77
LP PPKT MinkU-34 183.44 78.15 30.65 35.42 28.12 29.21 32.82 19.52 28.01 20.71 28.06
SLidR MinkU-34 179.38 77.18 34.88 38.09 32.64 26.44 33.73 20.81 31.54 21.44 29.95
Seal MinkU-34 166.18 75.38 37.33 42.77 29.93 37.73 40.32 20.31 37.73 24.94 33.88
SuperFlow MinkU-34 161.78 75.52 37.59 43.42 37.60 39.57 41.40 23.64 38.03 26.69 35.99

License

This work is under the Apache 2.0 license.

Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@inproceedings{xu2024superflow,
    title = {4D Contrastive Superflows are Dense 3D Representation Learners},
    author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei and Liu, Qingshan},
    booktitle = {European Conference on Computer Vision},
    pages = {58--80},
    year = {2024}
}

Acknowledgements

This work is developed based on the MMDetection3D codebase.


MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.

We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16Segment-Any-Point-Cloud, 17OpenSeeD, 18torchsparse. 💟

About

[ECCV 2024] 4D Contrastive Superflows are Dense 3D Representation Learners

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published