Liang Pan2, Kai Chen2, Ziwei Liu5, Qingshan Liu4
1Nanjing University of Aeronautics and Astronautics 2Shanghai AI Laboratory 3National University of Singapore 4Nanjing University of Posts and Telecommunications 5S-Lab, Nanyang Technological University
SuperFlow is introduced to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. It stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations.
- [2024.07] - Our paper is accepted by ECCV.
For details related to installation and environment setups, kindly refer to INSTALL.md.
Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.
To learn more usage about this codebase, kindly refer to GET_STARTED.md.
Method | Distill | nuScenes | KITTI | Waymo | |||||
---|---|---|---|---|---|---|---|---|---|
LP | 1% | 5% | 10% | 25% | Full | 1% | 1% | ||
Random | - | 8.10 | 30.30 | 47.84 | 56.15 | 65.48 | 74.66 | 39.50 | 39.41 |
PPKT | ViT-S | 38.60 | 40.60 | 52.06 | 59.99 | 65.76 | 73.97 | 43.25 | 47.44 |
SLiDR | ViT-S | 44.70 | 41.16 | 53.65 | 61.47 | 66.71 | 74.20 | 44.67 | 47.57 |
Seal | ViT-S | 45.16 | 44.27 | 55.13 | 62.46 | 67.64 | 75.58 | 46.51 | 48.67 |
SuperFlow | ViT-S | 46.44 | 47.81 | 59.44 | 64.47 | 69.20 | 76.54 | 47.97 | 49.94 |
PPKT | ViT-B | 39.95 | 40.91 | 53.21 | 60.87 | 66.22 | 74.07 | 44.09 | 47.57 |
SLiDR | ViT-B | 45.35 | 41.64 | 55.83 | 62.68 | 67.61 | 74.98 | 45.50 | 48.32 |
Seal | ViT-B | 46.59 | 45.98 | 57.15 | 62.79 | 68.18 | 75.41 | 47.24 | 48.91 |
SuperFlow | ViT-S | 47.66 | 48.09 | 59.66 | 64.52 | 69.79 | 76.57 | 48.40 | 50.20 |
PPKT | ViT-L | 41.57 | 42.05 | 55.75 | 61.26 | 66.88 | 74.33 | 45.87 | 47.82 |
SLiDR | ViT-L | 45.70 | 42.77 | 57.45 | 63.20 | 68.13 | 75.51 | 47.01 | 48.60 |
Seal | ViT-L | 46.81 | 46.27 | 58.14 | 63.27 | 68.67 | 75.66 | 47.55 | 50.02 |
SuperFlow | ViT-L | 48.01 | 49.95 | 60.72 | 65.09 | 70.01 | 77.19 | 49.07 | 50.67 |
Method | ScriKITTI | Rellis-3D | SemPOSS | SemSTF | SynLiDAR | DAPS-3D | Synth4D | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1% | 10% | 1% | 10% | Half | Full | Half | Full | 1% | 10% | Half | Full | 1% | 10% | |
Random | 23.81 | 47.60 | 38.46 | 53.60 | 46.26 | 54.12 | 48.03 | 48.15 | 19.89 | 44.74 | 74.32 | 79.38 | 20.22 | 66.87 |
PPKT | 36.50 | 51.67 | 49.71 | 54.33 | 50.18 | 56.00 | 50.92 | 54.69 | 37.57 | 46.48 | 78.90 | 84.00 | 61.10 | 62.41 |
SLiDR | 39.60 | 50.45 | 49.75 | 54.57 | 51.56 | 55.36 | 52.01 | 54.35 | 42.05 | 47.84 | 81.00 | 85.40 | 63.10 | 62.67 |
Seal | 40.64 | 52.77 | 51.09 | 55.03 | 53.26 | 56.89 | 53.46 | 55.36 | 43.58 | 49.26 | 81.88 | 85.90 | 64.50 | 66.96 |
SuperFlow | 42.70 | 54.00 | 52.83 | 55.71 | 54.41 | 57.33 | 54.72 | 56.57 | 44.85 | 51.38 | 82.43 | 86.21 | 65.31 | 69.43 |
# | Initial | Backbone | mCE | mRR | Fog | Rain | Snow | Blur | Beam | Cross | Echo | Sensor | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Full | Random | MinkU-18 | 115.61 | 70.85 | 53.90 | 71.10 | 48.22 | 51.85 | 62.21 | 37.73 | 57.47 | 38.97 | 52.68 |
SuperFlow | MinkU-18 | 109.00 | 75.66 | 54.95 | 72.79 | 49.56 | 57.68 | 62.82 | 42.45 | 59.61 | 41.77 | 55.21 | |
Random | MinkU-34 | 112.20 | 72.57 | 62.96 | 70.65 | 55.48 | 51.71 | 62.01 | 31.56 | 59.64 | 39.41 | 54.18 | |
SuperFlow | MinkU-34 | 91.67 | 83.17 | 70.32 | 75.77 | 65.41 | 61.05 | 68.09 | 60.02 | 58.36 | 50.41 | 63.68 | |
Random | MinkU-50 | 113.76 | 72.81 | 49.95 | 71.16 | 45.36 | 55.55 | 62.84 | 36.94 | 59.12 | 43.15 | 53.01 | |
SuperFlow | MinkU-50 | 107.35 | 74.02 | 54.36 | 73.08 | 50.07 | 56.92 | 64.05 | 38.10 | 62.02 | 47.02 | 55.70 | |
Random | MinkU-101 | 109.10 | 74.07 | 50.45 | 73.02 | 48.85 | 58.48 | 64.18 | 43.86 | 59.82 | 41.47 | 55.02 | |
SuperFlow | MinkU-101 | 96.44 | 78.57 | 56.92 | 76.29 | 54.70 | 59.35 | 71.89 | 55.13 | 60.27 | 51.60 | 60.77 | |
LP | PPKT | MinkU-34 | 183.44 | 78.15 | 30.65 | 35.42 | 28.12 | 29.21 | 32.82 | 19.52 | 28.01 | 20.71 | 28.06 |
SLidR | MinkU-34 | 179.38 | 77.18 | 34.88 | 38.09 | 32.64 | 26.44 | 33.73 | 20.81 | 31.54 | 21.44 | 29.95 | |
Seal | MinkU-34 | 166.18 | 75.38 | 37.33 | 42.77 | 29.93 | 37.73 | 40.32 | 20.31 | 37.73 | 24.94 | 33.88 | |
SuperFlow | MinkU-34 | 161.78 | 75.52 | 37.59 | 43.42 | 37.60 | 39.57 | 41.40 | 23.64 | 38.03 | 26.69 | 35.99 |
This work is under the Apache 2.0 license.
If you find this work helpful for your research, please kindly consider citing our paper:
@inproceedings{xu2024superflow,
title = {4D Contrastive Superflows are Dense 3D Representation Learners},
author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei and Liu, Qingshan},
booktitle = {European Conference on Computer Vision},
pages = {58--80},
year = {2024}
}
This work is developed based on the MMDetection3D codebase.
MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.
We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16Segment-Any-Point-Cloud, 17OpenSeeD, 18torchsparse. 💟