OpenPCDet
is a general PyTorch-based codebase for 3D object detection from point cloud.
It currently supports multiple state-of-the-art 3D object detection methods with highly refactored codes for both one-stage and two-stage 3D detection frameworks.
Based on OpenPCDet
toolbox, we win the Waymo Open Dataset challenge in 3D Detection,
3D Tracking, Domain Adaptation
three tracks among all LiDAR-only methods, and the Waymo related models will be released to OpenPCDet
soon.
We are actively updating this repo currently, and more datasets and models will be supported soon. Contributions are also welcomed.
- Data-Model separation with unified point cloud coordinate for easily extending to custom datasets:
- Unified 3D box definition: (x, y, z, dx, dy, dz, heading).
All the codes are tested in the following environment: • Linux (tested on Ubuntu 20.04) • Python.9.16 • PyTorch 1.13.0 • CUDA 11.7 • spconv v2.x
PointPillars is renowned for its unique combination of high inference speed and accuracy. On the esteemed KITTI dataset for autonomous vehicles, it achieves an unparalleled speed of 62 fps. This performance is substantially ahead of many contemporary models in the domain.
In 3D object detection, the raw data often arrives in the form of a point cloud: a collection of data points that represent the external surfaces of objects in three-dimensional space. PointPillars approach views these points from a top-down perspective, effectively categorizing them into x-y grids or 'pillars'. The partitioning of points is fundamentally based on two parameters:
- P: Number of non-empty pillars per sample.
- N: Number of points encapsulated in each pillar.
Each point within this cloud starts as a 4D vector (x, y, z, reflectance)
. Through the process, it's transformed into a 9D vector by augmenting with:
- Xc, Yc, Zc: Distances from the pillar's arithmetic mean.
- Xp, Yp: Distance from the center of its pillar in the x-y plane.
This gives us the detailed representation: D = [x, y, z, r, Xc, Yc, Zc, Xp, Yp]. From these transformations, a dense tensor of dimensions (D, P, N)
is formed. Depending on the pillar's data volume, the tensor is either populated through random sampling or zero-padding.
PointNet takes center stage for feature extraction. Unlike conventional networks that expect uniformity in input data, PointNet can directly process point clouds, respecting their unordered and irregular nature. It treats each point individually, exposing it to a series of transformations to derive high-level features.
After PointNet's operations, the resultant tensor dimensions stand at (C, P, N)
. A subsequent max pooling operation condenses this to (C, P)
. Using the indices of each point, the tensor is realigned to its original pillar structure. This essentially replaces the primary D-dimensional vector of each point with a C-dimensional feature vector.
The architectural backbone consists of cascading 3D convolutional layers. Their purpose is to distill features from the processed input at various scales. For pinpointing objects, the SSD (Single Shot Multibox Detector) is brought into play as the detection head. Originating from 2D image detection, the SSD, in this context, has been adapted to predict additional parameters suited for 3D data, specifically height, and elevation.
- x: Center coordinates on the x-axis (lengthwise).
- y: Center coordinate on the y-axis (widthwise).
- z: Vertical bottom-center coordinate.
- l: Box length (longest dimension on the ground).
- w: Box width (shortest dimension on the ground).
- h: Box height (vertical measurement).
- θ (theta): Object's rotational angle around the z-axis, signifying its orientation.
The insights and implementations are inspired and derived from the repository of OpenPCDet.
Firstly, we will discuss the difference between benchmarking the KITTI dataset and the Raw KITTI dataset.
Benchmarking KITTI | Raw KITTI | |
---|---|---|
Subset of the KITTI dataset | This subset of the KITTI dataset is specifically organized and labeled for the task of 3D object detection | The raw data doesn't come with task-specific labels like 3D bounding boxes for objects. |
Data Included | It includes LiDAR point clouds and corresponding camera images, along with calibration data. | The raw dataset provides the full data for every drive. This includes all sensor data such as LiDAR, images from multiple cameras, GPS, IMU, etc. |
- Pointpillar is designed to be more efficient by converting the 3D point cloud into a pseudo-image (2D representation) and then performing 2D convolutions. This simpler design can be more robust to noise and variations, especially if the more complex models are not fine-tuned.
- PV-RCNN and PointRCNN are more intricate and might perform better when finely tuned on the specific dataset. When applied directly without fine-tuning on raw data, they might not achieve their peak performance.
- The raw KITTI dataset might contain more noise or unfiltered points from the LiDAR. PointPillar's design, which involves grid-based pooling, inherently acts as a noise filter, making it more robust to such unprocessed data.
- If the models (PV-RCNN, PointRCNN) were trained intensively on the benchmark KITTI dataset, they might have unintentionally learned some specific characteristics of that dataset that do not generalize well to the raw KITTI dataset or other datasets. This over-optimization for the benchmark data can hinder performance when the model encounters raw or different data distributions.
Selected supported methods are shown in the below table. The results are the 3D detection performance of moderate difficulty on the val set of KITTI dataset.
- All LiDAR-based models are trained with 8 GTX 1080Ti GPUs and are available for download.
- The training time is measured with 8 TITAN XP GPUs and PyTorch 1.5.
training time | Car@R11 | Pedestrian@R11 | Cyclist@R11 | download | |
---|---|---|---|---|---|
PointPillar | ~1.2 hours | 77.28 | 52.29 | 62.68 | model-18M |
PointRCNN | ~3 hours | 78.70 | 54.41 | 72.11 | model-16M |
PV-RCNN | ~5 hours | 83.61 | 57.90 | 70.47 | model-50M |
Please refer to INSTALL.md for the installation of OpenPCDet
.
Please refer to DEMO.md for a quick demo to test with a pretrained model and visualize the predicted results on your custom data or the original KITTI data.
Please refer to GETTING_STARTED.md to learn more usage about this project.
OpenPCDet
is released under the Apache 2.0 license.
OpenPCDet
is an open source project for LiDAR-based 3D scene perception that supports multiple
LiDAR-based perception models as shown above. Some parts of PCDet
are learned from the official released codes of the above supported methods.
We would like to thank for their proposed methods and the official implementation.
We hope that this repo could serve as a strong and flexible codebase to benefit the research community by speeding up the process of reimplementing previous works and/or developing new methods.
If you find this project useful in your research, please consider cite:
@misc{openpcdet2020,
title={OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds},
author={OpenPCDet Development Team},
howpublished = {\url{https://github.com/open-mmlab/OpenPCDet}},
year={2020}
}
Welcome to be a member of the OpenPCDet development team by contributing to this repo, and feel free to contact us for any potential contributions.