Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

#3 Segmentation Algorithm #147

Merged
merged 1 commit into from
Oct 24, 2017

Conversation

WGierke
Copy link
Contributor

@WGierke WGierke commented Oct 1, 2017

As demanded by the ticket, I implemented a very simple segmentation algorithm that can be trained using Keras, its model is saved in the assets folder and that can predict nodules. The model is still pretty shitty, so hopefully there are some more experiences data scientists than me around to improve it :) However, the "environment" to train the model should be implemented by this PR.

Description

I created training data pairs based on the LIDC dataset using pylidc. The label image is a binary mask whose pixels are 1 if there is an annotation in the LIDC dataset whose malice is bigger or equal to 3 (out of 5). The label images are saved in the prediction/src/algorithms/segment/assets/ directory. Since some 3D arrays are hard to imagine, I added two methods that help to visualize these arrays by navigating through one dimension and simultaneously displaying the remaining two ones:
screenshot from 2017-10-01 14-52-21
display_training_pair() shows the input image and the binary output mask. Note the segmented annotation on the right hand side and the slider for the Z axis at the bottom.
When the mask is being applied on the input data, cuboid_show_slider shows us:
screenshot from 2017-10-01 14-51-05
prepare_training_data() in prediction/src/algorithms/segment/src/data_generation.py saves a npy file including the binary segmented mask of a lung for each patient.
train() in prediction/src/algorithms/segment/src/training.py trains the model based on the previously generated binary masks and the rescaled volume images of each DICOM scan. The model configuration is saved in prediction/src/algorithms/segment/src/model.py.

Reference to official issue

This addresses #3

Motivation and Context

We want to suggest the radiologist to have a look at nodules that are automatically found by an algorithm of us.

How Has This Been Tested?

I wrote a test that checks whether the response of the segmentation endpoint has the correct format.

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

I'd highly appreciate any feedback!

@@ -1,4 +1,4 @@
-r base.txt
flake8==3.3.0
pytest==3.1.3
pylidc==0.1.8
pylidc==0.1.9
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed a bug in to_volume, which we'll need later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if there's a better way of storing these kind of comments. Can we do

pylidc==0.1.9 # comment here

or at least put it in the commit. Putting in the PR review is really a poor place for it IMHO.

@@ -221,9 +220,6 @@ def predict_cubes(model_path, patient_id, magnification=1, ext_name=""): # noqa
if cube_mask.sum() < 2000:
skipped_count += 1
else:
if CROP_SIZE != CUBE_SIZE:
cube_img = helpers.rescale_patient_images2(cube_img, (CUBE_SIZE, CUBE_SIZE, CUBE_SIZE))
Copy link
Contributor Author

@WGierke WGierke Oct 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case in which rescale_patient_images2 would be executed, which never happens (CROP_SIZE is once set to CUBE_SIZE and not modified thereafter. Thus, this branch and rescale...2 could be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, comments here are great but...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure how I should comment in the code base that I removed a method that hasn't been used at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if there is no need for a branch/code but someone later might think there should be that is surely worth a comment around where it "could" be...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in 8416d58

if scan is None:
print("Scan for path '{}' was not found".format(path))
continue
vol = scan.to_volume(verbose=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to get the volume of a patient scan, the leading zeros of the dicom file names had to be removed.

def get_data_shape():
dicom_paths = get_dicom_paths()
max_x, max_y, max_z = get_max_scaled_dimensions(dicom_paths)
return 128, 128, 128, 1 # TODO: dynamically crop shape based on maximal axes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a fixed input (and thus output) size of the model. Something like taking the biggest spreads in every dimension would be possible (as it would be implemented by get_max_scaled_dimensions) but we would then be supposed to pad all images to this size. My memory was not enough for this so I decided to temporarily set the shape to a smaller, fixed one.

return model


def unet_model_3d(input_shape, downsize_filters_factor=1, pool_size=(2, 2, 2), n_labels=1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The U-Net uses a fully-convolutional architecture consisting of an encoder and a decoder. The encoder is able to capture contextual information while the decoder enables precise localization. Using a 3D U-Net might enable the algorithm to segment nodules better since it also gets the Z axis of the data. However, we clearly have to resize the image before passing it to the U-Net since the author states that having images of the size 144x144x144 results in 32 GB of memory consumption. We could also use 2D approaches of course, but it's questionable how we can make the algorithm aware of following and succeeding scan slices (perhaps using LSTMs?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These review comments are great - wouldn't we all be better served if they were in the code itself? That way they will last as this PR will eventually be somewhat difficult to find...? Just want to make sure we get good value from your effort of writing them!

@WGierke WGierke force-pushed the 3_segmentation_algorithm_lung branch 2 times, most recently from 707e012 to e7321ef Compare October 2, 2017 07:57
@dchansen
Copy link
Contributor

dchansen commented Oct 2, 2017

This looks great.

I think a 3D U-Net (sometimes called V-NET?) would probably take the prize for segmentation, but as you say, memory is a concern with 1mm voxel sizes. I don't think 32Gb memory use for evaluation can be right though - I'm currently training a VNET on 2 GPUS for an unrelated project, and with a batchsize of 4, I'm, still not out of memory. For inference, the memory use for a 144x144x144 volume should be less than 1Gb.

I am wondering how we benchmark these approaches fairly. We can't really evaluate on LIDC, as that's what everyone will use to train the models. Do we have a good validation dataset? I only have access to radiotherapy ones, and that's not quite the same thing.

@WGierke
Copy link
Contributor Author

WGierke commented Oct 2, 2017

Thanks @dchansen !
Do you mind sharing some stats about what's the input shape of your V-NET and how much GPU memory does it consume? Because I think that the model took more than 8 GB of GPU memory for an input shape of 512x512x512.
As described in #3 I think using cross validation on the LIDC dataset would make sense. The evaluation metrics introduced by #146 could be used to give a broader overview of how well the different approaches perform. Maybe some radiologists prefer precision (How many selected nodules are relevant?) over recall (How many relevant nodules are selected?) because they don't want to waste time with false positives, or vice versa because they have the time to evaluate false positives and prefer to have as many potential nodules in the suggestion as possible.

@dchansen
Copy link
Contributor

dchansen commented Oct 2, 2017

I'll check my exact configuration and memory as soon as I can, though it might not be before Wednesday. Sorry. I think it was 128x128x128, but I'll check. Total memory was 2x12Gb.

I believe part of the efficiency of VNET comes from the use of strided convolutions, rather than pooling layers. I do know that strided convolutions can lead to checkerboard artefacts though, so it might not give as nice results. The original VNET paper used a side of 128x128x64, and was able to fit 2 batches in 8Gb of memory for training. That should be enough to fit most (all?) nodules.

My concern with LIDC is that it might encourage overfitting to that dataset. Doing something like 5-fold cross validation would be quite difficult, as some of these models literally take weeks to train on a single Titan X.

'1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']"""
if in_docker:
return glob.glob("../images_full/LIDC-IDRI-*/**/**")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the else if we have unconditional return flow on the previous line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the method is not executed from within the Docker environment, the LIDC images aren't at /images_full but at test/assets/... . If you want to train the model, it would be recommended to do that outside of the Docker environment as it lacks of CUDA support. To also be able to use the GPU from within Docker, we'd be supposed to use nvidia-docker. Thus, I thought it would make sense to have a switch in the method to also get correct DICOM paths if you're outside of the Docker environment, inside of the prediction package and trying to train your model. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @lamby is saying is that you could refactor to something like this

    if in_docker:
        return glob.glob("../images_full/LIDC-IDRI-*/**/**")

    return glob.glob("../tests/assets/test_image_data/full/LIDC-IDRI-*/**/**")

or (even better) just use one return

    if in_docker:
        path = glob.glob("../images_full/LIDC-IDRI-*/**/**")
    else:
        path = glob.glob("../tests/assets/test_image_data/full/LIDC-IDRI-*/**/**")

    return path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 28660bf

e.g. ['../images_full/LIDC-IDRI-0001/1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178/' \
'1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']"""
if in_docker:
return glob.glob("../images_full/LIDC-IDRI-*/**/**")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always be using os.path.join :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on this one? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 28660bf

axis: index of the axis that should be changeable using the slider
is_mask: whether the cuboid that should be plotting is a binary mask
"""
import matplotlib.pyplot as plt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the inline imports? (please explain in a comment, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary, thanks ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a1d18ab

INTERMEDIATE_MALICIOUS = 3

current_dir = os.path.dirname(os.path.realpath(__file__))
assets_dir = os.path.abspath(os.path.join(current_dir, '../assets'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like this?

...
parent_dir = os.path.dirname(current_dir)
assets_dir = os.path.abspath(os.path.join(parent_dir, 'assets'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also 2 other times this same code block is used... you could consider making it a top level constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added get_assets_dir() in 28660bf


for index, path in enumerate(dicom_paths[batch_index:batch_index + CUBOID_BATCH]):
lidc_id = path.split('/')[5]
patient_id = path.split('/')[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nitpick, but how about setting the result of path.split('/') to a variable so that the split method is only called once?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, take a look at os.path.splitext, os.path.split, or PurePath.parts which should be more platform independent than split('/'). If none of those work, use os.path.sep instead of hardcoding /.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, implemented in 28660bf

@WGierke
Copy link
Contributor Author

WGierke commented Oct 4, 2017

Thanks @lamby and @reubano . Your proposed changes are way cleaner :)

dicom_paths = get_dicom_paths()
max_x, max_y, max_z = get_max_scaled_dimensions(dicom_paths)
return 128, 128, 128, 1 # TODO: dynamically crop shape based on maximal axes
return max_x, max_y, max_z, 1
Copy link
Contributor

@reubano reubano Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still a bit odd having two return statements. If you want to convey the same message, I'd suggest you move the first return statement to the end, and then comment out the other parts of the function. Also, could you add your above comment to the code itself so that others know why it's been commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I commented the method properly now :)

Copy link
Contributor

@isms isms Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest you move the first return statement to the end, and then comment out the other parts of the function.

Leaving commented-out code in PRs doesn't seem like a good practice. If there's a piece that is broken or semi-working but still merits being merged, perhaps the problems and TODOs can be tracked as issues instead.

A # TODO: [...] comment used sparingly may be okay _ to explain to readers_ why something it the way it is -- it's not a helpful way to track outstanding work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I should add an issue for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isms I filed #151 for that. Feel free to add comments to that issue if something is still missing.

@@ -1,4 +1,4 @@
-r base.txt
flake8==3.3.0
pytest==3.1.3
pylidc==0.1.8
pylidc==0.1.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if there's a better way of storing these kind of comments. Can we do

pylidc==0.1.9 # comment here

or at least put it in the commit. Putting in the PR review is really a poor place for it IMHO.

@@ -221,9 +220,6 @@ def predict_cubes(model_path, patient_id, magnification=1, ext_name=""): # noqa
if cube_mask.sum() < 2000:
skipped_count += 1
else:
if CROP_SIZE != CUBE_SIZE:
cube_img = helpers.rescale_patient_images2(cube_img, (CUBE_SIZE, CUBE_SIZE, CUBE_SIZE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, comments here are great but...

e.g. ['../images_full/LIDC-IDRI-0001/1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178/' \
'1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']"""
if in_docker:
return glob.glob("../images_full/LIDC-IDRI-*/**/**")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on this one? :)

'1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']"""
if in_docker:
return glob.glob("../images_full/LIDC-IDRI-*/**/**")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@WGierke
Copy link
Contributor Author

WGierke commented Oct 5, 2017

@lamby I already updated the get_dicom_paths method here (didn't mention that in the commit message, sorry).
Why does removing the

if CROP_SIZE != CUBE_SIZE:
    cube_img = helpers.rescale_patient_images2

stuff need comments? If we're honest, this shouldn't even have made it to the current code base - I introduced it in #118 but only noticed now that it currently isn't executed and was probably used by Julian de Wit to play around with the CROP_SIZE.

@lamby
Copy link
Contributor

lamby commented Oct 6, 2017

By "comments" I mean that — in various places in this PR — you have added extremely helpful descriptive comments on the changes, but they should live in the code otherwise your input of time and energy will very likely be "lost" in the depths of GitHub :(

@WGierke WGierke force-pushed the 3_segmentation_algorithm_lung branch from 0fe0893 to 25ba711 Compare October 6, 2017 16:37
in_docker: whether this method is invoked from within docker or from the prediction directory
"""
if in_docker:
paths = glob.glob(os.path.join('..', 'images_full', 'LIDC-IDRI-*', '**', '**'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, relative paths? Smells wrong...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 8416d58

@@ -221,9 +220,6 @@ def predict_cubes(model_path, patient_id, magnification=1, ext_name=""): # noqa
if cube_mask.sum() < 2000:
skipped_count += 1
else:
if CROP_SIZE != CUBE_SIZE:
cube_img = helpers.rescale_patient_images2(cube_img, (CUBE_SIZE, CUBE_SIZE, CUBE_SIZE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if there is no need for a branch/code but someone later might think there should be that is surely worth a comment around where it "could" be...

input_data = padded_data

output_data = model.predict(input_data)
segment_path = os.path.join(os.path.dirname(__file__), 'assets', "lung-mask.npy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer more constants than the use of underscoreunderscorefileunderscoreunderscore :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 8416d58

@WGierke
Copy link
Contributor Author

WGierke commented Oct 9, 2017

@lamby Would you mind reviewing again?



def get_assets_dir():
return os.path.abspath(os.path.join(os.getcwd(), 'src', 'algorithms', 'segment', 'assets'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surely just the same as underscore underscore file underscore underscore ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see some other uses of ../images still lingering around? Not sure if an oversight, or...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the constants introduced in 773df14 make sense?

in_docker: whether this method is invoked from within docker or from the prediction directory
"""
if in_docker:
paths = glob.glob(os.path.join('/images_full', 'LIDC-IDRI-*', '**', '**'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /images_full correct here? Looks like an absolute path. Again, we should be using some kind of constant, not hardcoding these deep under prediction/src/algorithms/segment/src/data_generation.py :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docker container, the full LIDC images are mounted under /images_full/... . test_correct_paths should ensure that the folders exist. However, you're right - I'll add constants to our settings - or should I define them somewhere else?

Copy link
Contributor

@lamby lamby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it,except for the

SEGMENT_ASSETS_DIR = os.path.abspath(os.path.join(os.getcwd(), 'src', 'algorithms', 'segment', 'assets'))
DICOM_PATHS_DOCKER = glob.glob(os.path.join('/images_full', 'LIDC-IDRI-*', '**', '**'))
DICOM_PATHS_LOCAL = glob.glob(
os.path.join(os.getcwd(), 'src', 'tests', 'assets', 'test_image_data', 'full', 'LIDC-IDRI-*', '**', '**'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we want to glob.glob on module import? Doesn't feel right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using constant wildcards as introduced in cd7d39a make more sense?

@lamby
Copy link
Contributor

lamby commented Oct 13, 2017

LGTM. @reubano ?

DICOM_PATHS_LOCAL = glob.glob(
os.path.join(os.getcwd(), 'src', 'tests', 'assets', 'test_image_data', 'full', 'LIDC-IDRI-*', '**', '**'))
DICOM_PATHS_DOCKER_WILDCARD = os.path.join('/images_full', 'LIDC-IDRI-*', '**', '**')
DICOM_PATHS_LOCAL_WILDCARD = os.path.join(os.getcwd(), 'src', 'tests', 'assets', 'test_image_data', 'full',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd juuust add a brief comment along the lines of "these are expanded at runtime" so that nobody gets too confused.

(Also I don't quite like the use of getcwd - is that "fixed"? Seems more sensible to use underscoreunderscorefileunderscoreunderscore)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to be in the prediction directory otherwise none of your code imports for the prediction code would work since this directory is copied into the prediction docker container. However, I can use __file__ here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to be in the prediction directory

Lets fix that another time, but let's at least not make the situation "worse" ;)

@rracinskij
Copy link

Hi,
should a 3D segmentation model be necessary used or is it reasonable to use a 2D model with every slice? If yes, I could try a FCN that seems to be working decently with digital pathology slides.

@WGierke
Copy link
Contributor Author

WGierke commented Oct 16, 2017

Hi @rracinskij ,
Thanks for your question! I just expected a 3D model to perform well as it also can capture the context between the scan slices in contract to a vanilla 2D model. But I think that any model that yields good results is a valuable contribution (especially since we currently don't have any nicely working segmentation model so far) :)

@rracinskij
Copy link

Thank you, @WGierke. Another newbie question - is there a training dataset with segmentation masks available or should the masks be derived from the annotations first?

@WGierke
Copy link
Contributor Author

WGierke commented Oct 16, 2017

@rracinskij We use the scans of the LIDC dataset. You can already find 3 patient scans in tests/assets/test_image_data/full. The neat thing is that there is a library (pylidc) that makes it really easy to query scan and annotation information. After downloading the dataset, set up pylidc by creating a .pylidc file locally that points to your local LIDC directory (like here) and run prepare_training_data which will save binary lung masks of each patient in which 1 indicates that the voxel was annotated to be cancerous by the radiologist(s). For further questions please refer to the gitter chat, so succeeding contributors can also find that information and don't have to look through closed pull requests ;)
( @lamby The aforementioned workflow might fit well into the docs as soon as the PR is merged, don't you think so?)

@lamby
Copy link
Contributor

lamby commented Oct 17, 2017

@WGierke Sounds reasonable :)

@WGierke
Copy link
Contributor Author

WGierke commented Oct 19, 2017

@lamby @reubano Would you mind reviewing again, please?

@lamby
Copy link
Contributor

lamby commented Oct 20, 2017

LGTM (as before). @reubano ?

@reubano
Copy link
Contributor

reubano commented Oct 21, 2017

It's looking good! With a few tweaks I got everything to run and was able to check the output of the routes via requests. And the cuboid_show_slider is pretty neat as well. It could use an example though to make it clear how it should be used (same goes for display_training_pair which I'm still a bit unclear about). Overall, I'm fine with merging, but would like to address @dchansen's comment.

Even though #142 won't be merged, I did like the simplicity of it since it doesn't seem to require training a model. I also like that it includes a diameter measurement in the output. That being said, there is a bit of duplication since it bypasses calculate_volume in favor of it's own volume calculation.

So my question to the both of you is this: Is there value in porting the predict and calculate_volume functions to use SimpleITK (aside from gaining diameter that is)?

Also, one thing I wasn't able to resolve was the inability to directly compare both algorithms. This one errors when passed nodule_locations = [{'x': 317, 'y': 367, 'z': 7}, {'x': 187, 'y': 217, 'z': 7}]. And the other errors when passed nodule_locations = [{'x': 50, 'y': 50, 'z': 21}].

@reubano
Copy link
Contributor

reubano commented Oct 21, 2017

aforementioned tweaks

 prediction/src/algorithms/identify/prediction.py  |  3 ++-
 prediction/src/algorithms/segment/src/training.py |  8 ++++++--
 prediction/src/preprocess/lung_segmentation.py    |  2 --
# diff --git a/prediction/src/algorithms/identify/prediction.py b/prediction/src/algorithms/identify/prediction.py
# --- a/prediction/src/algorithms/identify/prediction.py
# +++ b/prediction/src/algorithms/identify/prediction.py
 from keras.layers import Input, Convolution3D, MaxPooling3D, Flatten, AveragePoo
 from keras.metrics import binary_accuracy, binary_crossentropy, mean_absolute_error
 from keras.models import Model
 from keras.optimizers import SGD
-from src.preprocess.lung_segmentation import rescale_patient_images
+
+from ...preprocess.lung_segmentation import rescale_patient_images
 
 CUBE_SIZE = 32
 MEAN_PIXEL_VALUE = 41
# diff --git a/prediction/src/algorithms/segment/src/training.py b/prediction/src/algorithms/segment/src/training.py
# --- a/prediction/src/algorithms/segment/src/training.py
# +++ b/prediction/src/algorithms/segment/src/training.py
 import os
 
 import numpy as np
 import pylidc as pl
-from config import Config
 from keras.callbacks import ModelCheckpoint
 from keras.models import load_model
-from src.preprocess.lung_segmentation import save_lung_segments
 
 from .model import simple_model_3d
+from ....preprocess.lung_segmentation import save_lung_segments
+
+try:
+    from .....config import Config
+except ValueError:
+    from config import Config
 
 
 def get_data_shape():
# diff --git a/prediction/src/preprocess/lung_segmentation.py b/prediction/src/preprocess/lung_segmentation.py
# --- a/prediction/src/preprocess/lung_segmentation.py
# +++ b/prediction/src/preprocess/lung_segmentation.py
 from skimage.measure import label, regionprops
 from skimage.morphology import disk, binary_erosion, binary_closing
 from skimage.segmentation import clear_border
 
-from ..algorithms.identify.helpers import rescale_patient_images

@lamby
Copy link
Contributor

lamby commented Oct 22, 2017

ValueError ?

@reubano
Copy link
Contributor

reubano commented Oct 23, 2017

@WGierke This should make merging a bit easier...

after downloading patch.txt just perform the following:

# assuming you are on commit #9123a2
git apply --check patch.txt
git apply patch.txt
git commit -m 'Fix conflicts and bugs'  # or whatever other message you prefer

edit: updated patch file

@WGierke
Copy link
Contributor Author

WGierke commented Oct 23, 2017

Thanks a lot @reubano . Much appreciated :)

@reubano
Copy link
Contributor

reubano commented Oct 23, 2017

Minor error on travis. just hold tight...

@reubano
Copy link
Contributor

reubano commented Oct 23, 2017

Thanks a lot @reubano . Much appreciated :)

No prob. And here's a new patch file which should pass travis...

@WGierke
Copy link
Contributor Author

WGierke commented Oct 23, 2017

Thanks @reubano - works like a charm :)

@reubano
Copy link
Contributor

reubano commented Oct 24, 2017

Awesome! Can you squash to 1 commit?

@WGierke WGierke force-pushed the 3_segmentation_algorithm_lung branch 2 times, most recently from 8d412a0 to 0445e5b Compare October 24, 2017 07:42
@WGierke WGierke force-pushed the 3_segmentation_algorithm_lung branch from 0445e5b to 9a42000 Compare October 24, 2017 08:05
@WGierke
Copy link
Contributor Author

WGierke commented Oct 24, 2017

@reubano Done. Travis had some temporary issues so I had to git commit --amend && git push -f several times to restart the builds. In case you received any emails about me pushing the same commit message again and again to the branch ;)

@reubano reubano merged commit a71f7a1 into drivendataorg:master Oct 24, 2017
@reubano
Copy link
Contributor

reubano commented Oct 25, 2017

ValueError ?

@lamby IndexError

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants