Download the Lyft Level 5 AV Dataset (Perception)
from https://level5.lyft.com/dataset/ and decompress it
into LYFT_ROOT
. Install Lyft Dataset SDK (https://github.com/lyft/nuscenes-devkit):
[Update]: please follow this issue to resolve one point cloud file corruption issue(train/train_lidar/host-a011_lidar1_1233090652702363606.bin
).
pip install -U git+https://github.com/lyft/nuscenes-devkit
We have been experimenting with an earlier version of lyft dataset
(obtained on Aug 16 2019, now this version seems not available from the official website),
which contains fewer sequences than the lastest version and slightly different localization data.
The sample tokens of the old version were dumped in
data_preprocessing/lyft/lyft_2019_train_sample_tokens.txt
.
Convert the lyft dataset into KITTI format to LYFT_KITTI_FORMAT
by
ln -s LYFT_ROOT/train/train_maps LYFT_ROOT/train/maps
ln -s LYFT_ROOT/train/train_lidars LYFT_ROOT/train/lidars
ln -s LYFT_ROOT/train/train_images LYFT_ROOT/train/images
cd data_processing/lyft
python lyft2kitti.py --store_dir LYFT_KITTI_FORMAT --lyft_dataroot LYFT_ROOT/train \
--table_folder LYFT_ROOT/train/train_data \
--sample_token_list ./lyft_2019_train_sample_tokens.txt --meta_info_prefix trainset_
The indices of traversals were dumped in data_preprocessing/lyft/meta_data/lyft_2019_train_sample_tracks.pkl
.
You can skip the following by directly using the uploaded files in
data_preprocessing/lyft/meta_data/
. We include the corresponding
scripts to generate the splits as follows.
The train/test split of the traversals by their geo-location is generated by the following commands.
cd data_processing/lyft
python split_traintest.py --data_root LYFT_KITTI_FORMAT --track_list_file meta_data/lyft_2019_train_sample_tracks.pkl
It will generate
data_preprocessing/lyft/meta_data/train_track_list.pkl
data_preprocessing/lyft/meta_data/valid_train_idx_info.pkl
data_preprocessing/lyft/meta_data/train_idx.txt # 12407 samples
data_preprocessing/lyft/meta_data/test_track_list.pkl
data_preprocessing/lyft/meta_data/valid_test_idx_info.pkl
data_preprocessing/lyft/meta_data/test_idx.txt # 2917 samples
Note that the test_idx.txt
is not used, please use data_preprocessing/lyft/meta_data/time_valid_test_idx.txt
(2274 samples, subset of test_idx.txt
) as the test set to reproduce the results.
time_valid_test_idx.txt
contains
sample ids that have traversals predates the collection time. We use this test set
in the early development and report the evaluation results on this test set.
We observe a similar trend on data_preprocessing/lyft/meta_data/test_idx.txt
.
cd data_processing
python gather_historical_traversals.py --track_path lyft/meta_data/train_track_list.pkl \
--idx_info lyft/meta_data/valid_train_idx_info.pkl --idx_list lyft/meta_data/train_idx.txt \
--data_root LYFT_KITTI_FORMAT --traversal_ptc_save_root LYFT_KITTI_FORMAT/training/combined_lidar \
--trans_mat_save_root LYFT_KITTI_FORMAT/training/trans_mat
python gather_historical_traversals.py --track_path lyft/meta_data/test_track_list.pkl \
--idx_info lyft/meta_data/valid_test_idx_info.pkl --idx_list lyft/meta_data/time_valid_test_idx.txt \
--data_root LYFT_KITTI_FORMAT --traversal_ptc_save_root LYFT_KITTI_FORMAT/training/combined_lidar \
--trans_mat_save_root LYFT_KITTI_FORMAT/training/trans_mat
Dense traversals
are stored in LYFT_KITTI_FORMAT/training/combined_lidar
and the relative
transformation from current scan to the traversals are stored in
LYFT_KITTI_FORMAT/training/trans_mat
.
cd data_processing
python RANSAC.py --calib_dir LYFT_KITTI_FORMAT/training/calib \
--lidar_dir LYFT_KITTI_FORMAT/training/velodyne \
--planes_dir LYFT_KITTI_FORMAT/training/planes --min_h 1.5 --max_h 2.5
OpenPCdet typically uses copy-paste object augmentation, which copies objects from other scenes into the current scene during training. However, this may result in pasting objects into background regions like static bushes, which can confuse the model when it is compared with past traversals. We thus generate a background sample pointcloud for each scene in the training set, which is from dense traversals and with dynamic objects removed. The background sample is used only during training to prevent augmented objects being pasted into background regions such that the augmentation is valid with the presence of past traversals. To have a fair comparison, we apply such augmentation strategy to both base detectors and the Hindsight. We observe preventing pasting augmented objects into background can result in better detection performance on Car objects, a bit worse performance on Pedestrian and Cyclist objects. The trend reported in the paper is still consistent when such modification is not applied.
To generate the background sample:
cd data_processing
python generate_background_samples.py --save_dir LYFT_KITTI_FORMAT/training/bg_samples \
--voxel_size 0.4 --data_root LYFT_KITTI_FORMAT \
--label_dir LYFT_KITTI_FORMAT/training/label_2_full_range \
--calib_dir LYFT_KITTI_FORMAT/training/calib \
--trans_mat_dir LYFT_KITTI_FORMAT/training/trans_mat \
--track_path lyft/meta_data/train_track_list.pkl \
--idx_info lyft/meta_data/valid_train_idx_info.pkl \
--idx_list lyft/meta_data/train_idx.txt
The background samples are stored in LYFT_KITTI_FORMAT/training/bg_samples
.
Create soft links to lyft dataset:
cd downstream/OpenPCDet/data/lyft
ln -s LYFT_KITTI_FORMAT/training
After installing OpenPCDet, run
cd downstream/OpenPCDet/
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/lyft_dataset.yaml