This repository provides the source code of DVSOD baseline.
The code requires python>=3.8
, as well as pytorch>=1.11
and torchvision>=0.12
. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
-
Clone this repo.
$ git clone https://github.com/DVSOD/DVSOD-Baseline.git $ cd DVSOD-Baseline-main
-
Install dependencies.
$ conda env create -f dvsod.yaml $ conda activate dvsod
First download the DViSal
dataset. Then the model can be used in just a few adaptions to start training:
- Set your DViSal dataset path and ckpt save path in
train.py
- Perform training, with
python train.py
Meanwhile, the saliency can be generated by loading the model checkpoint, with:
- Set your DViSal dataset path and ckpt save path in
test.py
- Specify the ckpt name and testset name in
test.py
- Perform inference, with
python test.py
Instructions for vital parameters in train/test.py
:
- set '--is_ResNet' as **bool** # whether use ResNet or not
- set '--ckpt_load' as **bool** # whether load checkpoint or not
- set '--snapshot' as **int** # e.g. 100, which means loading the 100th checkpoint
- set '--baseline_mode' as **bool** # whether apply baseline mode or not
- set '--sample_rate'' as **int** # e.g. 3, whcih means sample rate
- set '--stm_queue_size' as **int** # e.g. 3, whcih means the number of memory frames
- set '--batchsize' as **int** # e.g. 2, whcih means batch size
- set '--trainsize' as **int** # e.g. 320, whcih means training data size
- set '--save_interval' as **int** # e.g. 2, whcih means saving ckpt per 2 epochs
- set '--epoch' as **int** # e.g. 200, whcih means epoch number during training
- set '--lr' as **float** # e.g. 1e-4, whcih means learning rate
@InProceedings{li2023dvsod,
title={DVSOD: RGB-D Video Salient Object Detection},
author={Li, Jingjing and Ji, Wei and Wang, Size and Li, Wenbo and Cheng, Li},
booktitle={Advances in Neural Information Processing Systems},
year={2023},
month={December}
}
We sincerely thank CPD, CRM, and STM for their outstanding project contributions!
@inproceedings{wu2019cascaded,
title={Cascaded partial decoder for fast and accurate salient object detection},
author={Wu, Zhe and Su, Li and Huang, Qingming},
booktitle={CVPR},
pages={3907--3916},
year={2019}
}
@inproceedings{ji2021calibrated,
title={Calibrated RGB-D salient object detection},
author={Ji, Wei and Li, Jingjing and Yu, Shuang and Zhang, Miao and Piao, Yongri and Yao, Shunyu and Bi, Qi and Ma, Kai and Zheng, Yefeng and Lu, Huchuan and others},
booktitle={CVPR},
pages={9471--9481},
year={2021}
}
@inproceedings{oh2019video,
title={Video object segmentation using space-time memory networks},
author={Oh, Seoung Wug and Lee, Joon-Young and Xu, Ning and Kim, Seon Joo},
booktitle={ICCV},
pages={9226--9235},
year={2019}
}