This repo provides the Code for the publication "Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models" (Link to Paper) by Christian Möller, Niklas Funk, Jan Peters. It therefore contains the functionality to train and test a Noise Conditioned Score Model for 6D pose estimation. Training is conducted via a Denoised Score Matching objective and inference is performed using Langevin dynamics.
For logging we use weights and bias. Please initialize https://docs.wandb.ai/quickstart.
If you use this project in your research, please cite it.
@misc{moeller24particlebased6dobjectpose,
title={Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models},
author={Christian M\"oller and Niklas Funk and Jan Peters},
year={2024},
eprint={2412.00835},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.00835},
}
Create conda environment
conda env create -f 6Dpose_estimation_env.yml
Active environment
conda activate 6Dpose_estimation_env
This repo relies on the Theseus, Please refer to https://github.com/AI-App/Theseus in case of installation issues.
We assume the data to be in the format followed in the BOP challenge (https://github.com/thodan/bop_toolkit/blob/master/docs/bop_datasets_format.md).
E.g. for use of the Linemod dataset, download the respective files according to https://bop.felk.cvut.cz/datasets/ and place them in the following structure:
├── ...
├── bop_data
│ ├── lm
│ │ ├── <subdir name>
│ │ │ ├── <block_id> # e.g. 000001
│ │ │ │ ├── depth # folder with depth images
│ │ │ │ ├── mask # foler with masks
│ │ │ │ ├── mask_visib # folder with visible masks
│ │ │ │ ├── rgb # folder with rgb images
│ │ │ │ ├── scene_gt.json # folder with gt
│ │ │ │ ├── scene_gt_info # folder with further infos
│ │ │ │ ├── train_idxs.txt # file with train idxs (optional)
│ │ │ │ ├── test_idxs.txt # file with test images (optional)
│ │ │ ├── <block_id> # e.g. 000002
│ │ │ └── ...
│ │ └── camera.json # camera intrinsics
│ │ └── models
│ └── ...
├── scripts
└── pose_estimation
Block ids can be any 6 digit string and the names of images in depth/mask/mask_visib/rgb should be a number with 6 digits starting at 000000, 000001, ....
Configuration files for training are stored in scripts/config/
. The folder contains one example
configuration for training the model on object 8 (driller).
Training requires two command line arguments
- config_file:
scripts/config/model_config/<config file name>.yaml
defining configurations of training - wandb_mode: if 'online' the run is logged to the wandb server, if 'offline' it is only logged locally.
python scripts train_pose_estimation.py --config_file train_config_example --wandb_mode offline
To sample 6D poses given an RGB-D image and the 3D Model of the object, we use Langevin dynamics for iterative inference process. It starts by randomly sampling a pose and then gradually walking along the gradient (output of the trained model) towards an accurate solution.
Inference requires two command line arguments
- config_file:
scripts/config/inference_config/<config file name>.yaml
defining configurations of inference - wandb_mode: if 'online' the run is logged to the wandb server, if 'offline' it is only logged locally.
python scripts train_pose_estimation.py --config_file inference_config_example --wandb_mode offline
Inference will sample multiple particles as pose hypothesis. All pose hypothesis will be stored as a dictionary
in a .npy file in results/<wandb_run_name>
. Load the results dict using
results_dict = np.load(res_path, allow_pickle=True).item()
The results dict will contain various information, such as the sampled poses, the history of poses throughout the inference process, the history of scores as well as the scene and object latent in the final iteration.
To sample a final pose prediction from the sampled particles, you can use the Selection Strategies in
pose_estimation/samplers/pose_sampler.py
. Utility functions in visualizations/utils.py
provide
useful functions to do so. You can use
from visualizations.utils import results_as_df
poses_df = results_as_df(results_dict)
to apply random selection, selection by score, selection by latent, selection by ground truth to the sampled particles and get a dataframe containing the final pose prediction based on each of the four methods for each sample of the inference dataset.