Fix some typos and issues, and improve docs (#2)

some small fixes
eeulig · Sep 3, 2024 · 8be1bcc · 8be1bcc
1 parent 977908d
commit 8be1bcc
Show file tree

Hide file tree

Showing 11 changed files with 103 additions and 46 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,37 @@
+<!-- Please complete this template entirely. -->
+
+
+Changes proposed in this pull request:
+<!-- Please list all changes/additions here. -->
+-
+
+<!-- Please complete the following checklist! Only leave the relevant subsection(s) based on what your PR implements. -->
+### Checklist
+<!-- Please only keep the relevant subsection 
+Documentation: If you've added or updated documentation.
+Fix: If you've fixed a bug or issue.
+Feature: If you've added a new feature.
+Denoising Method: If you've added a new LDCT denoising method.
+-->
+- [ ] I've read and followed all steps in the [contributing guide](https://github.com/eeulig/ldct-benchmark/blob/main/CONTRIBUTING.md).
+
+#### Documentation
+- [ ] I've checked that the docs build correctly locally by running `mkdocs serve`.
+
+#### Fix
+- [ ] I've added unit tests and gave them meaningful names. Ideally, I added a test that fails without my fix and passes with it.
+- [ ] I've updated or added meaningful docstrings in [numpy format](https://numpydoc.readthedocs.io/en/latest/format.html).
+- [ ] I ran `poe verify` and checked that all tests pass.
+
+#### Feature
+- [ ] I've added unit tests and gave them meaningful names.
+- [ ] I've updated or added meaningful docstrings in [numpy format](https://numpydoc.readthedocs.io/en/latest/format.html).
+- [ ] I ran `poe verify` and checked that all tests pass.
+
+#### Denoising Method
+- [ ] I've added unit tests and gave them meaningful names.
+- [ ] I've updated or added meaningful docstrings in [numpy format](https://numpydoc.readthedocs.io/en/latest/format.html). The docstring of the main trainer class contains a reference to the original publication.
+- [ ] I ran `poe verify` and checked that all tests pass.
+- [ ] I've added the method to the [table of implemented algorithms](https://github.com/eeulig/ldct-benchmark/blob/main/docs/denoising_algorithms.md#implemented-algorithms) **including a reference to the original publication**.
+- [ ] I've evaluated my algorithm and reported its performance [here](https://github.com/eeulig/ldct-benchmark/blob/main/docs/denoising_algorithms.md#test-set-performance).
+- [ ] I would like to contribute weights for the trained model to the model hub.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -3,14 +3,22 @@
 Thank you for your interest in contributing! Here are some guidelines:
 
 ## Installation
-Install the package locally using pip and in interactive mode. This way, you can immediately test your changes to the codebase.
+Install the package locally using pip and in editable mode. This way, you can immediately test your changes to the codebase.
 ```bash
-pip install -e .[dev]
+pip install -e .[dev,docs]
 ```
 
+## Updating the documentation
+The documentation is built using [MkDocs](https://www.mkdocs.org/). To build the documentation locally, run:
+```bash
+mkdocs serve
+```
+Please make sure that all markdown is rendered correctly, and all links are working.
+
 ## Contributing denoising algorithms
-1. Create a branch `git checkout -b method/fancy-method` for your new method.
-2. Create a folder for the new method in `ldctbench/methods`. The folder must contain the following files
+1. Fork and clone this repository
+2. Create a branch `git checkout -b method/fancy-method` for your new method.
+3. Create a folder for the new method in `ldctbench/methods`. The folder must contain the following files
     - `__init__.py`
     - `argparser.py`: Should implement a method `add_args()` that takes as input an `argparse.ArgumentParser`, adds custom arguments and returns it. If your method has an argument `fancy_arg`, then `argparser.py` should look like this:
         ```python
@@ -28,12 +36,13 @@ pip install -e .[dev]
         ```
     - `network.py`: Should implement the model as `class Model(torch.nn.Module)`.
     - `Trainer.py`: Should imeplement a `Trainer` class. This class should be initialized with `Trainer(args: argparse.Namespace, device: torch.device)` and implement a `fit()` method that trains the network. A base class is provided in `methods/base.py`.
-3. Add the method to `METHODS` in `argparser.py`.
-4. Add the method to `docs/denoising_algorithms.md`.
-5. Add a `fancy-method.yaml` config file containing all hyperparameters to `configs/`.
+4. Add the method to `METHODS` in `argparser.py`.
+5. Add the method to `docs/denoising_algorithms.md`.
+6. Add a `fancy-method.yaml` config file containing all hyperparameters to `configs/`.
 
 ## Contributing other features or bug fixes
-1. Create a new branch based on your change type:
+1. Fork and clone this repository
+2. Create a new branch based on your change type:
     - `fix/some-fix` for bug fixes
     - `feat/some-feature` for adding new features
 

diff --git a/README.md b/README.md
@@ -6,6 +6,7 @@
 # Benchmarking Deep Learning-Based Low Dose CT Image Denoising Algorithms
 ![Release Workflow Status](https://img.shields.io/github/actions/workflow/status/eeulig/ldct-benchmark/release.yml?label=release)
 ![Development Workflow Status](https://img.shields.io/github/actions/workflow/status/eeulig/ldct-benchmark/development.yml?label=dev)
+[![PyPI - Version](https://img.shields.io/pypi/v/ldct-benchmark?color=blue&cacheSeconds=!%5BPyPI%20-%20Version%5D(https%3A%2F%2Fimg.shields.io%2Fpypi%2Fv%2Fldct-benchmark))](https://pypi.org/project/ldct-benchmark/)
 ![License](https://img.shields.io/badge/MIT-blue?label=License)
 [![arXiv](https://img.shields.io/badge/2401.04661-red?label=arXiv)](https://arxiv.org/abs/2401.04661)
 
@@ -47,7 +48,7 @@ Please read our [documentation](https://eeulig.github.io/ldct-benchmark/) for de
 We welcome contributions of novel denoising algorithms. For details on how to do so, please check out our [contributing guide](https://github.com/eeulig/ldct-benchmark/blob/main/CONTRIBUTING.md) or reach out to [me](mailto:[email protected]).
 
 ## Reference
-If you find this project useful for you work, please cite our [arXiv preprint](https://arxiv.org/abs/2401.04661):
+If you find this project useful for your work, please cite our [arXiv preprint](https://arxiv.org/abs/2401.04661):
 > Elias Eulig, Björn Ommer, & Marc Kachelrieß (2024). Benchmarking Deep Learning-Based Low Dose CT Image Denoising Algorithms. arXiv, 2401.04661.
 
 ```bibtex

diff --git a/README_PYPI.md b/README_PYPI.md
@@ -35,7 +35,7 @@ Please read our [documentation](https://eeulig.github.io/ldct-benchmark/) for de
 We welcome contributions of novel denoising algorithms. For details on how to do so, please check out our [contributing guide](https://github.com/eeulig/ldct-benchmark/blob/main/CONTRIBUTING.md) or reach out to [me](mailto:[email protected]).
 
 ## Reference
-If you find this project useful for you work, please cite our [arXiv preprint](https://arxiv.org/abs/2401.04661):
+If you find this project useful for your work, please cite our [arXiv preprint](https://arxiv.org/abs/2401.04661):
 > Elias Eulig, Björn Ommer, & Marc Kachelrieß (2024). Benchmarking Deep Learning-Based Low Dose CT Image Denoising Algorithms. arXiv, 2401.04661.
 
 ```bibtex

diff --git a/docs/assets/getting_started.gif → docs/assets/example_denoise_dicoms.gif b/docs/assets/getting_started.gif → docs/assets/example_denoise_dicoms.gif
diff --git a/docs/examples/denoise_dicoms.md b/docs/examples/denoise_dicoms.md
@@ -3,58 +3,59 @@
 
 The pretrained models provided as part of the [model hub](../model_hub.md) can be used to denoise any CT DICOM dataset. See [here][implemented-algorithms] for a list of all available algorithms.
 
-Let's use two of these models, RED-CNN[^1] and DU-GAN[^2] to denoise DICOM slices from the openly available *Visible Human CT Dataset*[^3]
+Let's use two of these models, RED-CNN[^1] and DU-GAN[^2] to denoise DICOM slices from the CT data of the *NLM Visible Human Project*[^3] which can be downloaded from [NCI Imaging Data Commons](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=nlm_visible_human_project){:target="_blank"}.
 !!! warning "Warning"
-    This is an **out-of-distribution** setting as data of the *Visible Human CT Dataset* were acquired with a (29 year old) scanner and scan-protocols that are far from the training data distribution. The results of the models on such data should be interpreted with caution.
+    This is an **out-of-distribution** setting as data of the *Visible Human CT Dataset* were acquired with a (31 year old) scanner and scan-protocols that are far from the training data distribution. The results of the models on such data should be interpreted with caution.
 
 [^1]: H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose CT with a residual encoder-decoder convolutional neural network,” IEEE Transactions on Medical Imaging, vol. 36, no. 12, pp. 2524–2535, Dec. 2017
 [^2]: Z. Huang, J. Zhang, Y. Zhang, and H. Shan, “DU-GAN: Generative adversarial networks with dual-domain U-Net-based discriminators for low-dose CT denoising,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
-[^3]: McCarville, Alan, 2023, "Visible Human Project CT Datasets Male and Female", <https://doi.org/10.7910/DVN/3JDZCT>, Harvard Dataverse, V1
+[^3]: M. J. Ackerman, "The Visible Human Project," in Proceedings of the IEEE, vol. 86, no. 3, pp. 504-511, March 1998, doi: 10.1109/5.662875.
 
 Start by importing some modules:
 
 ```python
+# Make sure that s5cmd is installed! (pip install s5cmd)
 import os
-import requests  # For downloading the data
-from ldctbench.hub.utils import denoise_dicom  # For applying the models
+import torch
+from ldctbench.hub import Methods
+from ldctbench.hub.utils import denoise_dicom
 ```
 
-We'll download the first 10 slices of the female pelvis data to a new folder `./visible-human/orig`:
+We'll download 10 slices of the female pelvis data to a new folder `./visible-human/orig`:
 
 ```python
 # Create folders
 folder = "./visible-human"
 orig_data = os.path.join(folder, "orig")
 
 if not os.path.exists(orig_data):
-    os.mkdir(orig_data)
+    os.makedirs(orig_data)
 
-# Define base url and data ids of first 10 pelvis slices
-data_url = "https://dataverse.harvard.edu/api/access/datafile/"
-data_ids = [
-    "7576771",
-    "7576821",
-    "7576849",
-    "7576854",
-    "7576780",
-    "7576778",
-    "7576766",
-    "7576779",
-    "7576720",
-    "7576771",
+# Filenames of 10 pelvis slices
+files = [
+    "496788de-f0f0-41fd-b19a-6da82268fd0a.dcm",
+    "a535613b-de28-4080-850a-f5647ee33c96.dcm",
+    "9f7ef52e-c93d-430a-9038-970a47e95e3a.dcm",
+    "0c7ac013-41e3-404f-9081-9e0cc18f4f67.dcm",
+    "2591aad8-7673-4a12-98e0-8984dafa5175.dcm",
+    "28494e9b-d274-4310-a0ed-15d4220e1dc1.dcm",
+    "f5e41514-d30a-4cef-81ac-fce50b4743d8.dcm",
+    "21292f8c-072c-4223-859a-1e70bbc87a42.dcm",
+    "5a214c6b-6898-43c1-89f3-52c967dff39e.dcm",
+    "cd90f914-2b13-4cd9-9119-976a3c5721c1.dcm",
 ]
 
-# Download the data 
-for i, data_id in enumerate(data_ids):
-    r = requests.get(data_url + data_id)
-    with open(os.path.join(orig_data, f"{i}.dcm"), "wb") as f:
-        f.write(r.content)
+# Download the data
+for file in files:
+    os.system(
+        f's5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp "s3://idc-open-data/b9cf8e7a-2505-4137-9ae3-f8d0cf756c13/{file}" visible-human/orig'
+    )
 ```
 
 The function [ldctbench.hub.utils.denoise_dicom][] can be used to apply a pretrained model either to a single DICOM file or a folder containing multiple DICOM files. The processed DICOMs differ only in the `PixelData`, all other DICOM tags are identical to those in the source files. We'll use this function to apply the RED-CNN and DU-GAN models to all 10 slices we just downloaded:
 
 ```python
-# Apply RED-CNN and DU-GAN and store the processed DICOMs 
+# Apply RED-CNN and DU-GAN and store the processed DICOMs
 # to ./visible-human/redcnn and ./visible-human/dugan
 for method in [Methods.REDCNN, Methods.DUGAN]:
     denoise_dicom(
@@ -66,6 +67,9 @@ for method in [Methods.REDCNN, Methods.DUGAN]:
 ```
 The denoised DICOMs can be loaded with any DICOM viewer. Below we show a comparison of the original and denoised images:
 
-![Original](../assets/getting_started.gif)
+<figure markdown="span">
+  ![Denoised DICOMs](../assets/example_denoise_dicoms.gif)
+  <figcaption>Ten slices of the NLM Visible Human Project denoised using RED-CNN and DU-GAN. Original data is courtesy of the U.S. National Library of Medicine.</figcaption>
+</figure>
 
 Here we find that both RED-CNN and DU-GAN reduce the noise in the images. Additionally, it can be observed that RED-CNN smooths the images more than DU-GAN does which can be attributed to the fact that DU-GAN is trained in an adversarial fashion, whereas RED-CNN is trained with a simple mean squared error loss.
diff --git a/docs/examples/train_custom_model.md b/docs/examples/train_custom_model.md
@@ -155,6 +155,6 @@ and should take approximately 25 minutes on a single GPU.
 The training logs are stored to a folder `./wandb/offline-run-<timestamp>/files` (relative to the folder from which `ldctbench-train` was called). Let's have a look at the plot of training and validation loss that we find in that folder:
 
 <figure markdown="span">
-  ![Image title](../assets/loss_curves.png){ width="500" }
+  ![Loss curves](../assets/loss_curves.png){ width="500" }
   <figcaption>Training and validation loss for the 'simplecnn' method</figcaption>
 </figure>
diff --git a/docs/index.md b/docs/index.md
@@ -38,7 +38,7 @@ Therefore, the **aim** of this project is to
 We welcome contributions of novel denoising algorithms. For details on how to do so, please check out our [contributing guide](https://github.com/eeulig/ldct-benchmark/blob/main/CONTRIBUTING.md){:target="_blank"} or reach out to [me](mailto:[email protected]).
 
 ## Reference
-If you find this project useful for you work, please cite our [arXiv preprint](https://arxiv.org/abs/2401.04661){:target="_blank"}:
+If you find this project useful for your work, please cite our [arXiv preprint](https://arxiv.org/abs/2401.04661){:target="_blank"}:
 > Elias Eulig, Björn Ommer, & Marc Kachelrieß (2024). Benchmarking Deep Learning-Based Low Dose CT Image Denoising Algorithms. arXiv, 2401.04661.
 
 ```bibtex

diff --git a/ldctbench/scripts/download_data.py b/ldctbench/scripts/download_data.py
@@ -102,7 +102,7 @@ def main():
     # (https://github.com/kirbyju/tcia_utils). We do this to reduce package dependencies
     # (tcia_utils wants a lot of packages we don't need here), show progress bars, and
     # store data in same folder structure as the nbia-data-retriever would do.
-
+    global metadata_df
     parser = argparse.ArgumentParser()
     parser.add_argument("--manifest", default="", help="nbia manifest file")
     parser.add_argument(

diff --git a/pyproject.toml b/pyproject.toml
@@ -32,6 +32,7 @@ dev = [
     "pytest",
     "poethepoet",
     "flake8",
+    "s5cmd",
 ]
 docs = [
     "mkdocs",
@@ -44,6 +45,9 @@ docs = [
 ldctbench-download-data = "ldctbench.scripts.download_data:main"
 ldctbench-train = "ldctbench.scripts.train:main"
 ldctbench-test = "ldctbench.scripts.test:main"
+[project.urls]
+GitHub = "https://github.com/eeulig/ldct-benchmark"
+Documentation = "https://eeulig.github.io/ldct-benchmark/"
 
 [tool.setuptools.packages.find]
 include = ["ldctbench*"]

diff --git a/tests/test_hub_utils.py b/tests/test_hub_utils.py
@@ -3,7 +3,6 @@
 
 import numpy as np
 import pydicom
-import requests
 import torch
 
 from ldctbench.hub import Methods
@@ -45,9 +44,10 @@ def test_denoise_random_3D_numpy_array_not_normalized():
 def test_denoise_dicom():
     # Download a single DICOM to tempdir
     tempdir = tempfile.TemporaryDirectory()
-    r = requests.get("https://dataverse.harvard.edu/api/access/datafile/7576771")
-    with open(os.path.join(tempdir.name, "0.dcm"), "wb") as f:
-        f.write(r.content)
+    file_id = "496788de-f0f0-41fd-b19a-6da82268fd0a"
+    os.system(
+        f's5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp "s3://idc-open-data/b9cf8e7a-2505-4137-9ae3-f8d0cf756c13/{file_id}.dcm" {tempdir.name}'
+    )
 
     # Apply network
     denoise_dicom(
@@ -58,8 +58,10 @@ def test_denoise_dicom():
     )
 
     # Test that dicoms are identical except for PixelData DICOM tag
-    ds1 = pydicom.read_file(os.path.join(tempdir.name, "0.dcm"))
-    ds2 = pydicom.read_file(os.path.join(tempdir.name, f"0_{Methods.CNN10.value}.dcm"))
+    ds1 = pydicom.read_file(os.path.join(tempdir.name, f"{file_id}.dcm"))
+    ds2 = pydicom.read_file(
+        os.path.join(tempdir.name, f"{file_id}_{Methods.CNN10.value}.dcm")
+    )
     diffs = [
         (elem1.tag.group, elem1.tag.element)
         for (elem1, elem2) in zip(ds1, ds2)