Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dp sf pap #544

Merged
merged 47 commits into from
Nov 22, 2023
Merged

Dp sf pap #544

merged 47 commits into from
Nov 22, 2023

Conversation

daniellepace
Copy link
Collaborator

Improvements for 2D segmentation and pap project.

Includes:

  • Conventional U-Net architecture
  • Dice loss import and use
  • Plotting Dice scores for comparisons
  • Median computations for structures of interest, to .tsv and scatter plots
  • 2D image data augmentation

Questions:

  1. Includes changes to docker/vm_boot_images/config/tensorflow-requirements.txt. Shall we push these everywhere?
  2. Remaining issues with explorations/infer_medians, around line 740, as I refactored it to remove hard-coded items. Previously, I ran recipes with no --output-tensors if I wanted to do inference on all subjects. However, I now need the tensor_maps_out for its channel_map, output_name, etc. I tried removing the output tensors when I call test_train_valid_tensor_generators so that I don't filter on images with segmentations, and instead run inference on all subjects, but the code still gives me a .tsv with only ~500 subjects and not ~60K.

Copy link
Collaborator

@lucidtronix lucidtronix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daniellepace This is tremendous! thank you for the excellent additions! All my comments are small aesthetic things mostly around documentation.

@@ -269,6 +272,11 @@ def parse_args():
help='If true saves the model weights from the last training epoch, otherwise the model with best validation loss is saved.',
)

# 2D image data augmentation parameters
parser.add_argument('--rotation_factor', default=0., type=float, help='a float represented as fraction of 2 Pi, e.g., rotation_factor = 0.014 results in an output rotated by a random amount in the range [-5 degrees, 5 degrees]')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that these are data augmentation specific to the help strings, eg "For data augmentation, a float....'

@@ -376,6 +384,16 @@ def parse_args():
default='3M',
)

# Arguments for explorations/infer_medians
parser.add_argument('--analyze_ground_truth', default=True, help='Whether or not to filter by images with ground truth segmentations, for comparison')
parser.add_argument('--dates_file', help='File containing dates for each sample_id')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use the more general existing arg app_csv for this? Just trying to mitigate argument explosion.

parser.add_argument('--intensity_thresh', type=float, help='Threshold value for preprocessing')
parser.add_argument('--intensity_thresh_in_structures', nargs='*', default=[], help='Structure names whose pixels should be replaced if the images has intensity above the threshold')
parser.add_argument('--intensity_thresh_out_structure', help='Replacement structure name')
parser.add_argument('--results_to_plot', nargs='*', default=[], help='Structure names to make scatter plots')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use structures_to_analyze for this?

return y

def infer_medians(args):
assert(args.batch_size == 1) # no support here for iterating over larger batches
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is "public" add a Docstring

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe rename to make it a little more specific maybe infer_medians_from_segmented_regions or something...


def infer_medians(args):
assert(args.batch_size == 1) # no support here for iterating over larger batches
assert(len(args.tensor_maps_in) == 1) # no support here for multiple input maps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can bake the comment in to the assert with assert(len(args.tensor_maps_in) == 1, 'no support here for multiple input maps') and then it is a bit easier to debug.

@@ -666,13 +666,6 @@ def get_train_valid_test_paths(

logging.info(f'Found {len(train_paths)} train, {len(valid_paths)} validation, and {len(test_paths)} testing tensors at: {tensors}')
logging.debug(f'Discarded {len(discard_paths)} tensors due to given ratios')
if len(train_paths) == 0 or len(valid_paths) == 0 or len(test_paths) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just replace the or with and if there is no data we still want fail fast right?

@@ -732,6 +725,51 @@ def get_train_valid_test_paths_split_by_csvs(
logging.info(f"CSV:{balance_csvs[i-1]}\nhas: {len(train_paths[i])} train, {len(valid_paths[i])} valid, {len(test_paths[i])} test tensors.")
return train_paths, valid_paths, test_paths

# https://stackoverflow.com/questions/65475057/keras-data-augmentation-pipeline-for-image-segmentation-dataset-image-and-mask
def augment_using_layers(images, mask, in_shapes, out_shapes, rotation_factor, zoom_factor, translation_factor):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Docstring and typehints

if wrap_with_tf_dataset:

do_augmentation = bool(rotation_factor or zoom_factor or translation_factor)
logging.info(f'doing_augmentation {do_augmentation}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already in the log at the top with all of the arguments so this not needed, but if you want to keep the info statement here I would add the actual values of the augmentation arguments.

logging.info(f'doing_augmentation {do_augmentation}')

if do_augmentation:
assert(len(tensor_maps_in) == 1) # no support for multiple input tensors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bake comments into asserts

@@ -15,6 +15,7 @@ DOCKER_COMMAND="docker"
PORT="8888"
SCRIPT_NAME=$( echo $0 | sed 's#.*/##g' )
GPU_DEVICE="--gpus all"
ENV=""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what you use this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this if I want to use a notebook that uses ml4h code that I am actively developing. I can set my PYTHONPATH to my ml4h directory so that I can edit the ml4h code and have it accessible to the notebook. I use something similar when I run docker so that I can edit the ml4h code and have it show up in the docker without having to restart it.

Copy link
Collaborator

@lucidtronix lucidtronix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@daniellepace daniellepace merged commit f65ae7a into master Nov 22, 2023
3 checks passed
@daniellepace daniellepace deleted the dp_sf_pap branch November 22, 2023 18:56
@daniellepace daniellepace restored the dp_sf_pap branch December 8, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants