Skip to content

Latest commit

 

History

History
128 lines (111 loc) · 6.12 KB

LYFT_PREPROCESSING.md

File metadata and controls

128 lines (111 loc) · 6.12 KB

Download Lyft dataset and convert it to KITTI format

Download the Lyft Level 5 AV Dataset (Perception) from https://level5.lyft.com/dataset/ and decompress it into LYFT_ROOT. Install Lyft Dataset SDK (https://github.com/lyft/nuscenes-devkit):

[Update]: please follow this issue to resolve one point cloud file corruption issue(train/train_lidar/host-a011_lidar1_1233090652702363606.bin).

pip install -U git+https://github.com/lyft/nuscenes-devkit

We have been experimenting with an earlier version of lyft dataset (obtained on Aug 16 2019, now this version seems not available from the official website), which contains fewer sequences than the lastest version and slightly different localization data. The sample tokens of the old version were dumped in data_preprocessing/lyft/lyft_2019_train_sample_tokens.txt.

Convert the lyft dataset into KITTI format to LYFT_KITTI_FORMAT by

ln -s LYFT_ROOT/train/train_maps LYFT_ROOT/train/maps
ln -s LYFT_ROOT/train/train_lidars LYFT_ROOT/train/lidars
ln -s LYFT_ROOT/train/train_images LYFT_ROOT/train/images
cd data_processing/lyft
python lyft2kitti.py --store_dir LYFT_KITTI_FORMAT --lyft_dataroot LYFT_ROOT/train \
    --table_folder LYFT_ROOT/train/train_data \
    --sample_token_list ./lyft_2019_train_sample_tokens.txt --meta_info_prefix trainset_

The indices of traversals were dumped in data_preprocessing/lyft/meta_data/lyft_2019_train_sample_tracks.pkl.

[Optional] Obtain the train/test split

You can skip the following by directly using the uploaded files in data_preprocessing/lyft/meta_data/. We include the corresponding scripts to generate the splits as follows.

The train/test split of the traversals by their geo-location is generated by the following commands.

cd data_processing/lyft
python split_traintest.py --data_root LYFT_KITTI_FORMAT --track_list_file meta_data/lyft_2019_train_sample_tracks.pkl

It will generate

data_preprocessing/lyft/meta_data/train_track_list.pkl
data_preprocessing/lyft/meta_data/valid_train_idx_info.pkl
data_preprocessing/lyft/meta_data/train_idx.txt # 12407 samples

data_preprocessing/lyft/meta_data/test_track_list.pkl
data_preprocessing/lyft/meta_data/valid_test_idx_info.pkl
data_preprocessing/lyft/meta_data/test_idx.txt # 2917 samples

Note that the test_idx.txt is not used, please use data_preprocessing/lyft/meta_data/time_valid_test_idx.txt (2274 samples, subset of test_idx.txt) as the test set to reproduce the results. time_valid_test_idx.txt contains sample ids that have traversals predates the collection time. We use this test set in the early development and report the evaluation results on this test set. We observe a similar trend on data_preprocessing/lyft/meta_data/test_idx.txt.

Gather dense historical traversals

cd data_processing
python gather_historical_traversals.py --track_path lyft/meta_data/train_track_list.pkl \
    --idx_info lyft/meta_data/valid_train_idx_info.pkl --idx_list lyft/meta_data/train_idx.txt \
    --data_root LYFT_KITTI_FORMAT --traversal_ptc_save_root LYFT_KITTI_FORMAT/training/combined_lidar \
    --trans_mat_save_root LYFT_KITTI_FORMAT/training/trans_mat

python gather_historical_traversals.py --track_path lyft/meta_data/test_track_list.pkl \
    --idx_info lyft/meta_data/valid_test_idx_info.pkl --idx_list lyft/meta_data/time_valid_test_idx.txt \
    --data_root LYFT_KITTI_FORMAT --traversal_ptc_save_root LYFT_KITTI_FORMAT/training/combined_lidar \
    --trans_mat_save_root LYFT_KITTI_FORMAT/training/trans_mat

Dense traversals are stored in LYFT_KITTI_FORMAT/training/combined_lidar and the relative transformation from current scan to the traversals are stored in LYFT_KITTI_FORMAT/training/trans_mat.

Generate ground planes for detection training

cd data_processing
python RANSAC.py --calib_dir LYFT_KITTI_FORMAT/training/calib \
    --lidar_dir LYFT_KITTI_FORMAT/training/velodyne \
    --planes_dir LYFT_KITTI_FORMAT/training/planes --min_h 1.5 --max_h 2.5

Generate background sample for detection training

OpenPCdet typically uses copy-paste object augmentation, which copies objects from other scenes into the current scene during training. However, this may result in pasting objects into background regions like static bushes, which can confuse the model when it is compared with past traversals. We thus generate a background sample pointcloud for each scene in the training set, which is from dense traversals and with dynamic objects removed. The background sample is used only during training to prevent augmented objects being pasted into background regions such that the augmentation is valid with the presence of past traversals. To have a fair comparison, we apply such augmentation strategy to both base detectors and the Hindsight. We observe preventing pasting augmented objects into background can result in better detection performance on Car objects, a bit worse performance on Pedestrian and Cyclist objects. The trend reported in the paper is still consistent when such modification is not applied.

To generate the background sample:

cd data_processing
python generate_background_samples.py --save_dir LYFT_KITTI_FORMAT/training/bg_samples \
    --voxel_size 0.4 --data_root LYFT_KITTI_FORMAT \
    --label_dir LYFT_KITTI_FORMAT/training/label_2_full_range \
    --calib_dir LYFT_KITTI_FORMAT/training/calib \
    --trans_mat_dir LYFT_KITTI_FORMAT/training/trans_mat \
    --track_path lyft/meta_data/train_track_list.pkl \
    --idx_info lyft/meta_data/valid_train_idx_info.pkl \
    --idx_list lyft/meta_data/train_idx.txt

The background samples are stored in LYFT_KITTI_FORMAT/training/bg_samples.

Setup for detection training

Create soft links to lyft dataset:

cd downstream/OpenPCDet/data/lyft
ln -s LYFT_KITTI_FORMAT/training

After installing OpenPCDet, run

cd downstream/OpenPCDet/
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/lyft_dataset.yaml