Dp sf pap #544

daniellepace · 2023-11-03T16:47:30Z

Improvements for 2D segmentation and pap project.

Includes:

Conventional U-Net architecture
Dice loss import and use
Plotting Dice scores for comparisons
Median computations for structures of interest, to .tsv and scatter plots
2D image data augmentation

Questions:

Includes changes to docker/vm_boot_images/config/tensorflow-requirements.txt. Shall we push these everywhere?
Remaining issues with explorations/infer_medians, around line 740, as I refactored it to remove hard-coded items. Previously, I ran recipes with no --output-tensors if I wanted to do inference on all subjects. However, I now need the tensor_maps_out for its channel_map, output_name, etc. I tried removing the output tensors when I call test_train_valid_tensor_generators so that I don't filter on images with segmentations, and instead run inference on all subjects, but the code still gives me a .tsv with only ~500 subjects and not ~60K.

lucidtronix

@daniellepace This is tremendous! thank you for the excellent additions! All my comments are small aesthetic things mostly around documentation.

lucidtronix · 2023-11-08T13:20:50Z

ml4h/arguments.py

@@ -269,6 +272,11 @@ def parse_args():
        help='If true saves the model weights from the last training epoch, otherwise the model with best validation loss is saved.',
    )

+    # 2D image data augmentation parameters
+    parser.add_argument('--rotation_factor', default=0., type=float, help='a float represented as fraction of 2 Pi, e.g., rotation_factor = 0.014 results in an output rotated by a random amount in the range [-5 degrees, 5 degrees]')


Add that these are data augmentation specific to the help strings, eg "For data augmentation, a float....'

lucidtronix · 2023-11-08T13:22:33Z

ml4h/arguments.py

@@ -376,6 +384,16 @@ def parse_args():
        default='3M',
    )

+    # Arguments for explorations/infer_medians
+    parser.add_argument('--analyze_ground_truth', default=True, help='Whether or not to filter by images with ground truth segmentations, for comparison')
+    parser.add_argument('--dates_file', help='File containing dates for each sample_id')


Can we re-use the more general existing arg app_csv for this? Just trying to mitigate argument explosion.

lucidtronix · 2023-11-08T13:23:10Z

ml4h/arguments.py

+    parser.add_argument('--intensity_thresh', type=float, help='Threshold value for preprocessing')
+    parser.add_argument('--intensity_thresh_in_structures', nargs='*', default=[], help='Structure names whose pixels should be replaced if the images has intensity above the threshold')
+    parser.add_argument('--intensity_thresh_out_structure', help='Replacement structure name')
+    parser.add_argument('--results_to_plot', nargs='*', default=[], help='Structure names to make scatter plots')


Can we re-use structures_to_analyze for this?

lucidtronix · 2023-11-08T13:24:31Z

ml4h/explorations.py

+    return y
+
+def infer_medians(args):
+    assert(args.batch_size == 1) # no support here for iterating over larger batches


as this is "public" add a Docstring

Also maybe rename to make it a little more specific maybe infer_medians_from_segmented_regions or something...

lucidtronix · 2023-11-08T13:28:22Z

ml4h/explorations.py

+
+def infer_medians(args):
+    assert(args.batch_size == 1) # no support here for iterating over larger batches
+    assert(len(args.tensor_maps_in) == 1) # no support here for multiple input maps


You can bake the comment in to the assert with assert(len(args.tensor_maps_in) == 1, 'no support here for multiple input maps') and then it is a bit easier to debug.

lucidtronix · 2023-11-08T13:35:15Z

ml4h/tensor_generators.py

@@ -666,13 +666,6 @@ def get_train_valid_test_paths(

    logging.info(f'Found {len(train_paths)} train, {len(valid_paths)} validation, and {len(test_paths)} testing tensors at: {tensors}')
    logging.debug(f'Discarded {len(discard_paths)} tensors due to given ratios')
-    if len(train_paths) == 0 or len(valid_paths) == 0 or len(test_paths) == 0:


can we just replace the or with and if there is no data we still want fail fast right?

lucidtronix · 2023-11-08T13:35:37Z

ml4h/tensor_generators.py

@@ -732,6 +725,51 @@ def get_train_valid_test_paths_split_by_csvs(
            logging.info(f"CSV:{balance_csvs[i-1]}\nhas: {len(train_paths[i])} train, {len(valid_paths[i])} valid, {len(test_paths[i])} test tensors.")
    return train_paths, valid_paths, test_paths

+# https://stackoverflow.com/questions/65475057/keras-data-augmentation-pipeline-for-image-segmentation-dataset-image-and-mask
+def augment_using_layers(images, mask, in_shapes, out_shapes, rotation_factor, zoom_factor, translation_factor):
+


Add Docstring and typehints

lucidtronix · 2023-11-08T13:38:23Z

ml4h/tensor_generators.py

-    if wrap_with_tf_dataset:
+
+    do_augmentation = bool(rotation_factor or zoom_factor or translation_factor)
+    logging.info(f'doing_augmentation {do_augmentation}')


It's already in the log at the top with all of the arguments so this not needed, but if you want to keep the info statement here I would add the actual values of the augmentation arguments.

lucidtronix · 2023-11-08T13:38:40Z

ml4h/tensor_generators.py

+    logging.info(f'doing_augmentation {do_augmentation}')
+
+    if do_augmentation:
+        assert(len(tensor_maps_in) == 1) # no support for multiple input tensors


Bake comments into asserts

lucidtronix · 2023-11-08T13:39:44Z

scripts/jupyter.sh

@@ -15,6 +15,7 @@ DOCKER_COMMAND="docker"
 PORT="8888"
 SCRIPT_NAME=$( echo $0 | sed 's#.*/##g' )
 GPU_DEVICE="--gpus all"
+ENV=""


I'm curious what you use this for?

I use this if I want to use a notebook that uses ml4h code that I am actively developing. I can set my PYTHONPATH to my ml4h directory so that I can edit the ml4h code and have it accessible to the notebook. I use something similar when I run docker so that I can edit the ml4h code and have it show up in the docker without having to restart it.

lucidtronix

Awesome work!

lucidtronix and others added 30 commits September 5, 2023 15:58

use both _1 and _2 segmentations

a724d2d

use both _1 and _2 segmentations

7d0fa9c

TEMP: add dice metrics, copied from neuron

21d95bd

ENH: Use dice loss and metrics

f5d90ff

ENH: Remove kidney label and merge body/background labels

814f68f

FIX: Fix bad number of channels

aa36994

ENH: Use only one channel of the input image

6d40407

WIP: hacking bottom of U-net

5005d62

ENH: Add mean and std for normalization

d91c7f6

Merge branch 'master' into dp_sf_pap

fa17ab8

ENH: Add neurite and voxelmorph to docker

3f1f27b

FIX: Use dice loss from neurite

e173f10

STYLE: Fix up WIP code on hacking bottom of U-net

ab321bb

ENH: Add merged paps for segmentation tensormap

9e2d864

WIP: Fix Unet concats

78dd55d

FIX: Fix soft dice metrics

33248c3

ENH: Add plot_dice to compare

196e050

ENH: Add median computation for papillary segmentation project

6051b8d

FIX: Fix double plot on one graph

598f4c6

ENH: Allow generator to have empty path, e.g., to test on all images

3e8e2d7

ENH: Prune list of structures for which we do stats

f65a8f4

STYLE: rearranging

d012a13

WIP: Handle inference without ground truth labels

fd7c0cf

ENH: Remove option for merged paps

989c680

FIX: Get all b2s images, instance_2s only

83e5dc6

ENH: Add mri dates

75b0446

FIX: Fix normalization with correct padding

e54d4af

FIX: Fix soft dice metrics again

ddb9d46

COMP: Add option for environment variable for jupyter notebooks

84c5365

WIP: data augmentation

01bb4f5

daniellepace added 12 commits October 24, 2023 15:59

WIP: Better scatter plots for medians

ef92261

ENH: Report std too

79e87a6

ENH: Improve dice plots for a single model

b87407c

ENH: Log pearson correlation coefficients

d2449a9

STYLE: Adding TODOs to fix tensor_generators

081b5da

WIP: Add temporary code to save Dice scores

7b0643d

WIP: Add temporary code for plotting medians

e92f133

STYLE: Clean up code for infer_medians

18eed01

STYLE: Clean up medians code

9fa3897

STYLE: Add command-line args for median computations

0bf07bc

ENH: Add percentiles and tsv for dice calculations

ebcaa9c

STYLE: Add command-line args for data augmentation

94902da

lucidtronix requested changes Nov 8, 2023

View reviewed changes

daniellepace added 5 commits November 9, 2023 20:27

ENH: Improve log files for dice compare

8d8f6b0

STYLE: Small edits requested in PR

3967219

STYLE: docstring and typehints for plot_dice

86a9803

STYLE: docstring for infer_statistics_from_segmented_regions

364574a

STYLE: Docstring and typehints for augment_using_layers

0d2eb5f

daniellepace requested a review from lucidtronix November 14, 2023 17:02

lucidtronix approved these changes Nov 17, 2023

View reviewed changes

daniellepace merged commit f65ae7a into master Nov 22, 2023
3 checks passed

daniellepace deleted the dp_sf_pap branch November 22, 2023 18:56

daniellepace restored the dp_sf_pap branch December 8, 2023 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dp sf pap #544

Dp sf pap #544

daniellepace commented Nov 3, 2023

lucidtronix left a comment

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

lucidtronix Nov 8, 2023

daniellepace Nov 14, 2023

lucidtronix left a comment

Dp sf pap #544

Dp sf pap #544

Conversation

daniellepace commented Nov 3, 2023

lucidtronix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucidtronix left a comment

Choose a reason for hiding this comment