Ca3dPose

We present a Ca3DPose(Canonical 3D Pose) estimation method via object-level classification and reimplementation of a NPCS (Normalized Part Coordinate Space) based method to object level to do 3D pose estimation. We use the newly released 3D scene dataset Multiscan with over 200 scans. Our model relies on the MinkowskiEngine-powered U-Net backbone or PAConv backbone to get point-level features, voxelized or max pooled point-level features to get object-level features and does object-level classification. We formalize 3D Pose as a combination of the up-direction class, front-direction latitude class, and front-direction longitude class. As a result, we used the results of the NPCS method as a baseline, and our Ca3DPose outperformed the baseline method in the Multiscan dataset. We also found that PAConv backbone outperformed the U-Net backbone.

We trained the our model on newly released Multiscan dataset

The main contribution is predicting the canonical 3d pose(front and up direction) of an object given its point cloud by object-level classification

The basic code architecture of W&B logger, Hydra part and the Backbone model are fromMINSU3D

ObjectClassifier model introduction

ObjectClassifier is an efficient framework(MinkowskiEngine based) for point cloud object level pose estimation. It voxelizes the per point features from UNet to obtain object-level features. It also discretizes the front/up directions into different latitude and longitude classes and then computes the directions given the predicted class. Therefore, the canonical pose estimation can be simplied as a classifiction problem and 3 layer MLP is used.

The classification details in a sphere:

Normalized Object Coordinate Space regression introduction

The Normalized Object Coordinate Space is the reimplementation of the Normalized Part Coordinate Space model. We modified some model details and loss function to fit the Multiscan dataset and the features from UNet in MinkowskiEngine

The Evalutaion metrics

AC_(angle): the accuracy, the threshold is that the angle between prediction and ground truth direction is within [angle] degree
Rerr: average angle between prediction and ground truth direction, in Radian system

Our best results on test set

What	AC_5	AC_10	AC_20	Rerr	Number
wall	0.764	0.809	0.828	0.543	157
door	0.610	0.659	0.683	0.937	41
table	0.517	0.583	0.633	0.849	60
chair	0.529	0.657	0.857	0.236	70
cabinet	0.652	0.710	0.783	0.595	69
window	0.824	0.941	1.000	0.046	17
sofa	0.636	0.727	0.864	0.274	22
microwave	0.500	0.667	0.667	1.061	6
pillow	0.727	0.788	0.939	0.157	33
tv_monitor	0.455	0.500	0.545	1.427	22
curtain	0.591	0.591	0.682	0.791	22
trash_can	0.875	0.875	0.875	0.393	8
suitcase	0.594	0.625	0.688	0.669	32
sink	0.286	0.500	0.500	1.479	14
backpack	0.000	0.250	0.250	1.808	4
bed	0.750	0.750	0.750	0.588	8
refrigerator	0.600	0.600	0.600	0.909	10
toilet	0.333	0.444	0.444	1.052	9
average	0.631	0.697	0.763	0.621	604

The dataset is the newly released Multiscan dataset using our ObjectClassifier model. Our model is only trained on the 8 object categoried with articulated parts.

The results using the NOCS model are lower than results in our model.

Baseline NOCS results on test set

Method	AC_5	AC_10	AC_20	Rerr
Ca3DPose (U-Net)	0.328	0.349	0.387	1.812
Ca3DPose (PAConv)	0.631	0.697	0.763	0.621
NOCS (baseline)	0.004	0.040	0.175	1.359

Features

Design a new object level classfier method based on latitude class and longitude class, the model architecture is as followed.
Preprocess the Multiscan to get the objects with annotated canonical poses
Highly-modularized design enables researchers switch between the NOCS model and our ObjectClassifier model easily.
Better logging with W&B, periodic evaluation during training, and Easy experiment configuration with Hydra.

Setup

Environment requirements

CUDA 11.X
Python 3.8

Conda (recommended)

We recommend the use of miniconda to manage system dependencies.

# create and activate the conda environment
conda create -n min3dcapose python=3.8
conda activate min3dcapose

# install PyTorch 1.8.2
conda install pytorch cudatoolkit=11.1 -c pytorch-lts -c nvidia

# install Python libraries
pip install -e .

# Python libraries installation verfication
python -c "import min3dcapose"

# install OpenBLAS and SparseHash via conda
conda install openblas-devel -c anaconda
conda install -c bioconda google-sparsehash
export CPATH=$CONDA_PREFIX/include:$CPATH
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

# install MinkowskiEngine
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps \
--install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

# install C++ extensions
cd minsu3d/common_ops
python setup.py develop

Pip (without conda)

Note: Setting up with Pip (no conda) requires OpenBLAS and SparseHash to be pre-installed in your system.

# create and activate the virtual environment
virtualenv --no-download env
source env/bin/activate

# install PyTorch 1.8.2
pip install torch==1.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111

# install Python libraries
pip install -e .

# install OpenBLAS and SparseHash via APT
sudo apt install libopenblas-dev libsparsehash-dev

# install MinkowskiEngine
pip install MinkowskiEngine

# install C++ extensions
cd minsu3d/common_ops
python setup.py develop

Data Preparation

Multiscan dataset

Download the Multiscan dataset and repo. To acquire the access to the dataset, please refer to their instructions. The download dataset would follow this file system structure You will get a download script if your request is approved:
Substitute the MULTISCAN/dataset/preprocess/gen_instsegm_dataset.py file in the downloaded Multiscanrepo with the gen_instsegm_dataset.py, set the environment following the Instructions
Preprocess the data, it converts the objects with annotated pose to .pth data, and split dataset as the default way byMultiscan

# about 406.3GB in total of Multiscan raw dataset
python gen_instsegm_dataset.py
# the processed data is about 5.9GB in total

Create own `.pth` dataset

Each .pth file should named by the scans, which contains all the objects in that scan. The objects dictionary should have the following keys:

"xyz": 
"rgb": 
"normal": 
"obb": 
      "front":
      "up":
"instance_ids":
"sem_labels":

Download Multiscan objects directly

Download splitted Multiscan objects with metadata by Multiscan_objects

Training, Inference and Evaluation

Note: Configuration files are managed by Hydra, you can easily add or override any configuration attributes by passing them as arguments.

# log in to WandB
wandb login

# train a model from scratch
python train.py model={model_name} data={dataset_name}

# train a model from a checkpoint
python train.py model={model_name} data={dataset_name} model.ckpt_path={checkpoint_path}

# test a pretrained model
python test.py model={model_name} data={dataset_name} model.ckpt_path={pretrained_model_path}

# evaluate inference results
python eval.py model={model_name} data={dataset_name} model.model.experiment_name={experiment_name}

# examples:
# python train.py model=nocs data=multiscan model.trainer.max_epochs=120
# python test.py model=object_classifier data=multiscan model.ckpt_path=Object_Classifier_best.ckpt
# python eval.py model=nocs data=multiscan model.model.experiment_name=run_1

Pretrained Models

We provide pretrained models for Multiscan. The pretrained model and corresponding config file are given below. Note that all NOCS models are trained from scratch. While the ObjectClassifier model is trained from the pretrained HAIS-MultiScanObj-epoch=55.ckpt model which is trained on Multiscan dataset. It uses the hyper-parameters in Backbone UNet to accelerate training process. After downloading a pretrained model, run test.py to do inference as described in the above section.

Multiscan test set

Model	Code	AC_5	AC_10	AC_20	Rerr	Download
ObjectClassifier	config model	0.318	0.337	0.348	1.337	link
NOCS	config model					link

Visualization

We provide scripts to visualize the predicted and ground truth canonical 3d pose of an object. When testing and inferencing, use the following option to show visualizations

model.show_visualization: True

the default visualization results will be saved in the following file structure

min3dcapose
├── visualization_results
# results whose average angles error is below 5 degree
│   ├── Ac5- 
# input object
│   │   ├── [object_name].png 
# object in predicted canonical pose
│   │   ├── [object_name]_r.png 
# results whose average angles error is over 30 degree
│   ├── Ac30- 
│   │   ├── [object_name].png 
│   │   ├── [object_name]_r.png

Some results visualizations are as followed

# the red arrow is the predicted front direction
# left one: the original object with default OBB by `pcd.get_oriented_bounding_box()` in Open3d, the pose is randomly rotated
# right one: object rotated to predicted canonicalized pose, the OBB is align to canonicalized axis

The good predictions when angle<5 degree:

object name	uncanonicalized pose	predicted canonical pose
toilet
chair
cabinet
door

The bad predictions when angle>30 degree:

object name	uncanonicalized pose	predicted canonical pose
toilet
chair
cabinet
door

Performance

We report the time it takes to train on Multiscan data of 134 scans

Test environment

CPU: Intel Core i7-12700 @ 2.10-4.90GHz × 12
RAM: 32GB
GPU: NVIDIA GeForce RTX 3090 Ti 24GB
System: Ubuntu 20.04.2 LTS

Training time in total (without validation)

Model	Epochs	Batch Size	Time
ObjectClassifier	15	8	12hr4min
NOCS	91	8	30hr10min

Inference time per object (avg)

Model	Time
ObjectClassifier	1.004s

Limitations

It's hard to predict the canonical pose of some objects categories due to annotation limitations. For instance, the front direction of some windows is defined as pointing into room. Therefore, the front direction is hard to predict without background.
The results of the reimplementation of NOCS model still need to be impoved.

Acknowledgement

This repo is built upon the MinkowskiEngine and Minsu3d. We train our models on Multiscan. If you use this repo and the pretrained models, please cite the original papers.

Reference

@inproceedings{mao2022multiscan,
    author = {Mao, Yongsen and Zhang, Yiming and Jiang, Hanxiao and Chang, Angel X, Savva, Manolis},
    title = {MultiScan: Scalable RGBD scanning for 3D environments with articulated objects},
    booktitle = {Advances in Neural Information Processing Systems},
    year = {2022}
}

@article{Zhou2018,
    author    = {Qian-Yi Zhou and Jaesik Park and Vladlen Koltun},
    title     = {{Open3D}: {A} Modern Library for {3D} Data Processing},
    journal   = {arXiv:1801.09847},
    year      = {2018},
}

@article{ravi2020pytorch3d,
    author = {Nikhila Ravi and Jeremy Reizenstein and David Novotny and Taylor Gordon
                  and Wan-Yen Lo and Justin Johnson and Georgia Gkioxari},
    title = {Accelerating 3D Deep Learning with PyTorch3D},
    journal = {arXiv:2007.08501},
    year = {2020},
}

@inproceedings{choy20194d,
  title={4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks},
  author={Choy, Christopher and Gwak, JunYoung and Savarese, Silvio},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3075--3084},
  year={2019}
}

@misc{https://doi.org/10.48550/arxiv.2211.05272,
  doi = {10.48550/ARXIV.2211.05272},
  url = {https://arxiv.org/abs/2211.05272},
  author = {Geng, Haoran and Xu, Helin and Zhao, Chengyang and Xu, Chao and Yi, Li and Huang, Siyuan and Wang, He},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Name		Name	Last commit message	Last commit date
Latest commit History 542 Commits
.vscode		.vscode
config		config
min3dcapose		min3dcapose
new_visualization_results		new_visualization_results
visualization_results		visualization_results
Lng_Lat_class.png		Lng_Lat_class.png
NOCS.png		NOCS.png
ObjectClassifier.png		ObjectClassifier.png
README.md		README.md
eval.py		eval.py
gen_instsegm_dataset.py		gen_instsegm_dataset.py
setup.py		setup.py
test.log		test.log
test.py		test.py
train.log		train.log
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ca3dPose

ObjectClassifier model introduction

Normalized Object Coordinate Space regression introduction

The Evalutaion metrics

Our best results on test set

Baseline NOCS results on test set

Features

Setup

Conda (recommended)

Pip (without conda)

Data Preparation

Multiscan dataset

Create own `.pth` dataset

Download Multiscan objects directly

Training, Inference and Evaluation

Pretrained Models

Multiscan test set

Visualization

Performance

Limitations

Acknowledgement

Reference

About

Releases

Packages

Languages

colinzhenli/Ca3DPose

Folders and files

Latest commit

History

Repository files navigation

Ca3dPose

ObjectClassifier model introduction

Normalized Object Coordinate Space regression introduction

The Evalutaion metrics

Our best results on test set

Baseline NOCS results on test set

Features

Setup

Conda (recommended)

Pip (without conda)

Data Preparation

Multiscan dataset

Create own .pth dataset

Download Multiscan objects directly

Training, Inference and Evaluation

Pretrained Models

Multiscan test set

Visualization

Performance

Limitations

Acknowledgement

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Create own `.pth` dataset

Packages