The TransOAR project was initially developed for Transformer-based organs-at-risk detection and contains code of three 3D Detection Transformers, namely Focused Decoder, DETR, and Deformable DETR. Additionally, we adopted RetinaNet/Retina U-Net from nnDetection into our training pipeline to ensure comparability of results with traditional CNN-based detectors.
To access the featured detectors and their detailed configs, please checkout the linked branches:
Focused Decoder: A novel medical Detection Transformer restricting the cross-attention’s field of view.
DETR [1]: A 3D implementation of the original Detection Transformer DETR.
Deformable DETR [2]: A 3D implementation of Deformable DETR.
RetinaNet/Retina U-Net [3][4]: Adapted from the cited sources to fit our training pipeline.
February 23: Focused Decoding Enables 3D Anatomical Detection by Transformers has been accepted at MELBA!
May 22: SwinFPN: Leveraging Vision Transformers for 3D Organs-At-Risk Detection has been accepted at MIDL 22!
TL;DR: Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention’s field of view to RoIs, alleviating the need for large-scale annotated datasets. Focused Decoder not only delivers competitive results but also facilitates the accessibility of explainable results via attention weights.
The usage remains the same for all branches and, therefore, all featured detectors.
Create a new virtual environment using, for example, anaconda:
conda create -n transoar python=3.8
and run:
pip install -e .
The installation was tested using:
- Python 3.8
- Ubuntu 20.04
- CUDA 11.4
To compile the CUDA operations of the deformable attention module, run:
cd ./transoar/models/ops
python setup.py install
Alternatively, one can experiment with the python implementation by deactivating the flag use_cuda
in the respective config file.
To compile NMS used in RetinaNet/Retina U-Net, checkout this branch and follow the general installation steps described above.
We provide exemplary preprocessing scripts for two publicly available datasets. It should be mentioned that these preprocessing scripts should act as templates to experiment with additional datasets.
AMOSS22 challenge [5]
- Download the training data of the challenge's first stage. The structure should be as follows:
AMOS22/
└── imagesTr/
└── <case_id>.nii.gz
└── imagesTs/
└── <case_id>.nii.gz
└── labelsTr/
└── <case_id>.nii.gz
└── task1_dataset.json
└── task2_dataset.json
- Update paths to the raw data in
./config/preprocessing_amos.yaml
. - Run
python prepare_dataset_amos.py
to generate the preprocessed dataset, which will be stored under./dataset
.
VISCERAL anatomy benchmark [6]
- Download the CT images contained in the Gold Corpus (GC) and Silver Corpus (SC) subsets. The structure of the GC and SC subsets should be as follows:
GC/SC subset/
└── <case_id>/
└── <case_id>_CT_wb.nii.gz
└── <case_id>_CT_wb_seg.nii.gz
- Update paths to GC and SC subsets in
./config/preprocessing_visceral.yaml
. - Run
python prepare_dataset_visceral.py
to generate the preprocessed dataset, which will be stored under./dataset
.
First, set the dataset
flag in the respective config file to the name of the preprocessed dataset. If necessary, modify the config file accordingly.
To train on a specific dataset, run:
python CUDA_VISIBLE_DEVICE=<gpu_id> scripts/train.py --config attn_fpn_<detector>_<dataset>.yaml
To evaluate performances of created checkpoints on the test sets, run:
python scripts/test.py --run <name_of_checkpoint_in_folder_runs> --num_gpu <gpu_id> --full_labeled
For visualization of results and attention maps, please check additional flags in scripts/test.py
.
This repository also contains code for SwinFPN. To include 3D Swin Transformer blocks in the FPN backbone, please activate the flag use_encoder_attn
in the respective config files.
We additionally experimented with 3D Deformable DETR encoder blocks as additional refinement stages after the FPN backbone. To activate these 3D Deformable DETR encoder blocks activate the flag use_encoder_attn
.
If you find our repository useful in your research, please consider citing::
@article{wittmann2023focused,
title={Focused Decoding Enables 3D Anatomical Detection by Transformers},
author={Wittmann, Bastian and Navarro, Fernando and Shit, Suprosanna and Menze, Bjoern},
journal={Machine Learning for Biomedical Imaging},
volume={2},
issue={February 2023 issue},
year={2023},
pages={72--95},
issn={2766-905X},
url={https://melba-journal.org/2023:003}
}
@inproceedings{wittmann2022swinfpn,
title={Swin{FPN}: Leveraging Vision Transformers for 3D Organs-At-Risk Detection},
author={Wittmann, Bastian and Shit, Suprosanna and Navarro, Fernando and Peeken, Jan C and Combs, Stephanie E and Menze, Bjoern},
booktitle={Medical Imaging with Deep Learning},
year={2022},
url={https://openreview.net/forum?id=yiIz7DhgRU5}
}
[1] Carion et al., "End-to-end object detection with transformers," EVVC, 2020, https://github.com/facebookresearch/detr.
[2] Zhu et al., "Deformable DETR: Deformable transformers for end-to-end object detection," ICLR, 2021, https://github.com/fundamentalvision/Deformable-DETR.
[3] Baumgartner et al., "nnDetection: A self-configuring method for medical object detection," MICCAI, 2021, https://github.com/MIC-DKFZ/nnDetection.
[4] Jaeger et al., "Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection," PMLR ML4H, 2020, https://github.com/MIC-DKFZ/medicaldetectiontoolkit.
[5] AMOS 2022: Multi-Modality Abdominal Multi-Organ Segmentation Challenge 2022, MICCAI, 2022, https://amos22.grand-challenge.org/.
[6] Jimenez-del Toro et al., "Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: VISCERAL anatomy benchmarks," IEEE TMI, 2016, https://visceral.eu/benchmarks.