Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Converted Team 'grt123' identification and classification algorithms #122

Merged
merged 1 commit into from
Sep 29, 2017

Conversation

dchansen
Copy link
Contributor

Description

This pull request implements the grt123 identification algorithm.
It is based on PyTorch, which currently conflicts with Tensorflow, so the pull request also disables the Keras import.

While the algorithm is quite fast on GPU, it is painfully slow (but working) on the CPU due to a bug in PyTorch. The bug is fixed in the PyTorch development branch, but the next release is about 2 months off.

Reference to official issue

#4
#1

Motivation and Context

This Pull request implements part of the prediction service (https://concept-to-clinic.readthedocs.io/en/latest/design-doc.html#detect-prediction-service)

How Has This Been Tested?

The algorithm has largely been tested by manually confirming about 5 random LUNA2016 datasets - as the loaded model has already been tested extensively elsewhere, this was deemed sufficient.
It has been tested both outside of docker, on a Titan X GPU, and inside docker on an i7 Intel CPU.

Screenshots (if appropriate):

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well




def test_classify_predict_model_load(dicom_path):
Copy link
Contributor

@WGierke WGierke Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @lamby already stated here it'd be nice to still have some tests that validate the output of the models. I'm currently working on some tags to only execute these long running tests if one explicitly desires to do so. So feel free to write this test a bit more detailed :) Edit: I added this decorator in c3c626f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll have a look at that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test cases to be a little more comprehensive.

@isms
Copy link
Contributor

isms commented Sep 18, 2017

[...] the pull request also disables the Keras import.

@dchansen I'm not sure this is a feasible compromise. Is there a way to reimplement this in Tensorflow?

@dchansen
Copy link
Contributor Author

@isms I agree that it's not a feasible compromise at all. I did it mainly to highlight the issue.

I think we should have a single deep learning framework for the project as a whole, as they don't really play well together. I am not sure if Keras is the best choice, but either way, the project leads should make the decisions. Perhaps we should open an issue?

Converting the model itself into Keras would relatively easy, but converting the weights is somewhat more involved. Converting to Caffe2 or CNTK would be easy, as they all support the ONNX format.

@dchansen dchansen changed the title Converted Team 'grt123' identification algorithm Converted Team 'grt123' identification and classification algorithms Sep 19, 2017
Copy link
Contributor

@reubano reubano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with this! Here some other thoughts...

use git-lfs for the dsb2017_detector.ckpt model
adopt google style docs
use new style classes
fix spacing issues (have you run flake8?)
integrate skip_slow_test


reader = sitk.ImageSeriesReader()
filenames = reader.GetGDCMSeriesFileNames(dicom_path)
if len(filenames) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not filenames:

if np.average([min_distance[i] for i in range(label.shape[0]) if slice_area[i] > area_th]) < dist_th:
valid_label.add(vol.label)

if len(valid_label) == 0:
Copy link
Contributor

@reubano reubano Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not valid_label:

bw3 = bw1 & bw2
label = measure.label(bw, connectivity=1)
label3 = measure.label(bw3, connectivity=1)
l_list = list(set(np.unique(label)) - {0})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to iterate fine without needing to cast this to a list

sum = 0
while sum < np.sum(area) * cover:
sum = sum + area[count]
count = count + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum += area[count] and count += 1

@dchansen
Copy link
Contributor Author

dchansen commented Sep 19, 2017

I am a little unsure why, but the conflict between Tensorflow and Pytorch seems gone. It was related to this issue in Tensorflow tensorflow/models#523

@dchansen dchansen mentioned this pull request Sep 19, 2017
1 task
@lamby
Copy link
Contributor

lamby commented Sep 20, 2017

This should block on #118 due to the slow test stuff.

@reubano
Copy link
Contributor

reubano commented Sep 22, 2017

Looking pretty good! Just a few questions for you. I noticed that in classify/trained_model/predict, you commented out the previous code instead of deleted it. Was just curious, why did you decided to keep it around? Also there's still some spacing errors. Can you run flake8 prediction in your terminal? Finally, I see this travis error:

$ git lfs pull
Git LFS: (540 of 540 files, 2 skipped) 497.08 MB / 497.08 MB, 41.20 MB skipped 
[eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4] Object does not exist on the server: [404] Object does not exist on the server
[3a46182c1c4883e170d5b576d3a2645309f34817c53c16d4f7f29fbdcf799308] Object does not exist on the server: [404] Object does not exist on the server

I'll try to pinpoint which files exactly are missing. Meanwhile, can you verify that the model you added to git-lfs was properly uploaded? Thanks for your work!

@dchansen
Copy link
Contributor Author

The git lfs issue should be resolved now.
I kept the previous code because I was unsure what the long-term plans were regarding the style of the project. Conceivably, this pull request could be changed to fit the style of the previous one.
While I don't agree with all the choices made, it was in many ways nicer than what I contributed here. The main reason I did not follow it more closely was one of time, as I just wanted a working adaption of the gtr123 model as soon as possible.

I have fixed all the flake8 issues, except the complexity ones. Hopefully, I will have them fixed by the end of the day.

@reubano
Copy link
Contributor

reubano commented Sep 22, 2017

The git lfs issue should be resolved now.

Awesome!

I kept the previous code because I was unsure what the long-term plans were regarding the style of the project. Conceivably, this pull request could be changed to fit the style of the previous one. While I don't agree with all the choices made, it was in many ways nicer than what I contributed here. The main reason I did not follow it more closely was one of time, as I just wanted a working adaption of the gtr123 model as soon as possible.

Ok, I understand. I'm taking a look to see how to integrate both versions.

I have fixed all the flake8 issues, except the complexity ones. Hopefully, I will have them fixed by the end of the day.

Great!

@reubano
Copy link
Contributor

reubano commented Sep 22, 2017

Ok... here's my attempt. What do you think?

import SimpleITK as sitk
from src.algorithms.classify.src import gtr123_model
from src.preprocess.load_ct import load_ct, MetaData


def predict(dicom_path, centroids, model_path=None,
            preprocess_ct=None, preprocess_model_input=None):
    reader = sitk.ImageSeriesReader()
    filenames = reader.GetGDCMSeriesFileNames(dicom_path)

    if not filenames:
        raise FileNotFoundError()

    reader.SetFileNames(reader.GetGDCMSeriesFileNames(dicom_path))
    image = reader.Execute()

    if preprocess_ct:
        meta = load_ct(dicom_path)[1]
        voxel_data = preprocess_ct(image, MetaData(meta))
    else:
        voxel_data = image

    if preprocess_model_input:
        preprocessed = preprocess_model_input(voxel_data, centroids)
    else:
        preprocessed = voxel_data

    return gtr123_model.predict(preprocessed, centroids)

Since the keras model is now gone, do you see any reason to continue to keeping model_path? If not, I can help you refactor it out.

@dchansen
Copy link
Contributor Author

This would work for now.
I think the long term solution should be to move the gtr123 preprocessing into preprocess_ct. I will probably not have time to do that until next week though.

@reubano
Copy link
Contributor

reubano commented Sep 22, 2017

I think the long term solution should be to move the gtr123 preprocessing into preprocess_ct.

Agreed. However, I think your PR can work without that at this stage (MVP). I looked around some more and saw that gtr123_model.predict actually takes a model_path param. So the last line could be return gtr123_model.predict(preprocessed, centroids, model_path). I think then, the only change would have to be #L14-L16 here in test_classify_trained_model_predict.py.

@reubano
Copy link
Contributor

reubano commented Sep 26, 2017

Looking good! What are your thoughts about incorporating something like I mentioned here? Main reason is just so that we don't disable current functionality with this PR. I think with that, we can get this merged.

@dchansen
Copy link
Contributor Author

I'm trying to do a reasonable merge right now, and algorith.identify.trained_model is giving me a headache.
I think we really need a standardized way of specifying a model, which is independent of frameworks (Keras, Torch, Tensorflow, etc).

I would suggest that each model independently implements a predict method taking only a dicom path (for identification) or a dicom path and a list of candidates. trained_model.predict would then take the same argument, plus a string identifying the algorithm to be used . How does this sound?

@vessemer
Copy link
Contributor

@dchansen I'm right now working on that, also I've tried to convert the grt123 classification model into Keras (TF), but I've faced troubles with the BatchNorm layer, despite it transfer weights correctly it feeds me with different results (if I just throw away each BatchNorm from the graph then the output becomes just identical, thus I need to dig deeper).
What is related to the way of standardization without transferring the model I've almost done.

@reubano
Copy link
Contributor

reubano commented Sep 26, 2017

I'm trying to do a reasonable merge right now, and algorith.identify.trained_model is giving me a headache.

Do you mind elaborating? What specifically is causing issues?

I think we really need a standardized way of specifying a model, which is independent of frameworks (Keras, Torch, Tensorflow, etc).

I would suggest that each model independently implements a predict method taking only a dicom path (for identification) or a dicom path and a list of candidates. trained_model.predict would then take the same argument, plus a string identifying the algorithm to be used . How does this sound?

So if I understand you correctly, algorithms.identify.src.gtr123_model would look like this:

def predict(dicom_path):
    model_path = 'src/algorithms/identify/assets/dsb2017_detector.ckpt'
    ...

and algorithms.identify.trained_model would look like this:

from src.algorithms.identify import src 

def predict(dicom_path, algorithm='gtr123_model'):
    model = func(src, algorithm)
    return model.predict(dicom_path)

Is this correct?

@dchansen
Copy link
Contributor Author

@vessemer I am a little unsure about the benefit of moving the model to Keras now that Tensorflow and Torch seem to work without disturbing one another. Different approaches will require different enough post- and pre-processing that they can't be represented as a Keras model anyway.
Long term, I do see the advantage of having all models on a single deep-learning framework, but is Keras the right choice? Using ONNX as the way to save the model would allow for easy use of many frameworks, including Caffe2 and CNTK.
Either way, as soon as there is a consensus, I would be happy to help converting the models to that particular format.

One difference between the Torch and Keras batchnorm is that they use a different epsilon value (1e-5 vs 1e-3), so maybe that's the issue?
Could you make your WIP available on the model standardization?

@dchansen
Copy link
Contributor Author

@reubano Yes, that was my thought exactly. The problem right now is that all the models take ownership of trained_model.predict, and there is no good way of switching between particular models. Ideally, we might also like to do meta-model, combining many different ones, and a uniform interface would help greatly.

@vessemer
Copy link
Contributor

@dchansen, thank you for your input, but I've taken into account the difference of the default values, I've used the nn-transfer for the model standardization.

@dchansen
Copy link
Contributor Author

@vessemer Well, if you can share the code, I can have a look at it.

@reubano
Copy link
Contributor

reubano commented Sep 26, 2017

The problem right now is that all the models take ownership of trained_model.predict, and there is no good way of switching between particular models. Ideally, we might also like to do meta-model, combining many different ones, and a uniform interface would help greatly.

For now, I think this is the preferred course of action. We want to focus on one main model for each of (classify, identify, and segment). Later on, we may consider making it easier to switch out the models. The current area of flexibility is in the serialized trained data, e.g., assets/dsb2017_detector.ckpt. So while it is still tied to the underlying python ML model, you can tweak the training parameters as needed. See my comment on #2 for more info.

Also, let me know what exactly is giving you a headache so I can try and help you out.

@dchansen
Copy link
Contributor Author

dchansen commented Sep 26, 2017

My issue was that I had to disable a lot of code in identify.trained_model
But I think this is what you want?

@reubano
Copy link
Contributor

reubano commented Sep 26, 2017

I was literally suggesting you just copy/paste my code from here.

else:
preprocessed = voxel_data

return gtr123_model.predict(preprocessed, centroids)
Copy link
Contributor

@reubano reubano Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...
model_path = model_path or 'src/algorithms/classify/assets/gtr123_model.ckpt'
...
return gtr123_model.predict(image, centroids, model_path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you're asking here?

Copy link
Contributor

@reubano reubano Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reintroducing the model_path part from here. Also just noticed a typo which I've fixed. Maybe that tripped you up.. sorry!

@reubano
Copy link
Contributor

reubano commented Sep 26, 2017

Just about there! Does this solve your headache with algorithms.identify.trained_model?

@reubano
Copy link
Contributor

reubano commented Sep 26, 2017

From the travis log

$ flake8 prediction
prediction/src/algorithms/identify/trained_model.py:13:1: F401 'glob' imported but unused
prediction/src/algorithms/identify/trained_model.py:15:1: F401 'dicom' imported but unused
prediction/src/algorithms/identify/trained_model.py:16:1: F401 'src.preprocess.errors.EmptyDicomSeriesException' imported but unused
prediction/src/algorithms/identify/trained_model.py:17:1: F401 'src.preprocess.lung_segmentation.save_lung_segments' imported but unused
prediction/src/algorithms/classify/trained_model.py:68:56: E231 missing whitespace after ','
prediction/src/preprocess/extract_lungs.py:57:19: E127 continuation line over-indented for visual indent
prediction/src/preprocess/extract_lungs.py:66:18: E127 continuation line over-indented for visual indent
prediction/src/preprocess/extract_lungs.py:67:18: E127 continuation line over-indented for visual indent
prediction/src/preprocess/extract_lungs.py:68:18: E127 continuation line over-indented for visual indent
prediction/src/tests/test_endpoints.py:10:1: F811 redefinition of unused 'json' from line 7
The command "flake8 prediction" exited with 1.

$ pycodestyle prediction
prediction/src/algorithms/classify/trained_model.py:68:56: E231 missing whitespace after ','
prediction/src/preprocess/extract_lungs.py:57:19: E127 continuation line over-indented for visual indent
prediction/src/preprocess/extract_lungs.py:66:18: E127 continuation line over-indented for visual indent
prediction/src/preprocess/extract_lungs.py:67:18: E127 continuation line over-indented for visual indent
prediction/src/preprocess/extract_lungs.py:68:18: E127 continuation line over-indented for visual indent

@dchansen
Copy link
Contributor Author

The test run in my local environment (outside docker). Inside docker they run when individually, but not when running them all at once. I suspect it is tensorflow/torch shenanigans, but it is quite tricky to debug.

@dchansen
Copy link
Contributor Author

Using a different docker image solved the problem for me.
I cannot for the life of me figure out where the segfault is coming from in the python3:6 image however.

@reubano
Copy link
Contributor

reubano commented Sep 27, 2017

Yea that is odd... which image works? And what is the one that doesn't?

@dchansen
Copy link
Contributor Author

pytorch:36 segfaults.
varunagrawal/pytorch does not, but I had to manually install opencv and tkinter in it.

@dchansen
Copy link
Contributor Author

I changed the image to be Ubuntu based instead. All tests should pass now.

@reubano
Copy link
Contributor

reubano commented Sep 27, 2017

Ok.... looks like the tests passed, but there are some new lint errors now

$ flake8 prediction
prediction/src/tests/__init__.py:1:1: F401 'torch' imported but unused
prediction/src/tests/__init__.py:1:13: E261 at least two spaces before inline comment
prediction/src/tests/__init__.py:1:14: E262 inline comment should start with '# '
The command "flake8 prediction" exited with 1.

$ pycodestyle prediction
prediction/src/tests/__init__.py:1:13: E261 at least two spaces before inline comment
prediction/src/tests/__init__.py:1:14: E262 inline comment should start with '# '
The command "pycodestyle prediction" exited with 1.

@dchansen
Copy link
Contributor Author

The torch import is required simply because torch needs to be imported before Keras, always. I am very open to other ways of doing that.

@reubano
Copy link
Contributor

reubano commented Sep 27, 2017

Try adding this on the offending line: # noqa pylint: disable=unused-import

Edit: if that doesn't work, try this: # noqa: F401

@reubano
Copy link
Contributor

reubano commented Sep 28, 2017

Looks like everything is a go now! Do you mind squashing to 1 commit? Or perhaps two, 1 for each issue (if it's possible to separate them).

@dchansen dchansen force-pushed the master branch 2 times, most recently from d888503 to 11b8146 Compare September 28, 2017 12:38
@dchansen
Copy link
Contributor Author

I seem to have messed that up rather thoroughly. This will take me a while to sort out.

@reubano
Copy link
Contributor

reubano commented Sep 28, 2017

No worries :). Perhaps I can help out. What seems to be the issue(s)?

@dchansen
Copy link
Contributor Author

Just not very experienced in using git rebase. I found myself in the situation of having to redo every merge made on the project for the last ten days.
I have messed up my repository quite badly - I think my best approach is to try again from home, where I have a pristine version of my repo, before my attempts at rebasing.

@WGierke
Copy link
Contributor

WGierke commented Sep 28, 2017

Instead of rebasing, I prefer to squash the commits like so. It fulfills the purpose as well, namely having only 1 commit in the end :)

@reubano
Copy link
Contributor

reubano commented Sep 28, 2017

You can also take a look at the advice I posted here.

@reubano
Copy link
Contributor

reubano commented Sep 28, 2017

Also git config --global rerere.enabled true will save conflict resolutions so in that in the future, you only have to resolve them once.

@dchansen
Copy link
Contributor Author

@WGierke Thank you! Uploading the squashed commit now.

@dchansen dchansen force-pushed the master branch 2 times, most recently from 1ac544e to 524f843 Compare September 28, 2017 15:11
@reubano
Copy link
Contributor

reubano commented Sep 28, 2017

$ flake8 interface
interface/backend/api/tests.py:44:1: W293 blank line contains whitespace
interface/backend/api/tests.py:48:57: E502 the backslash is redundant between brackets
interface/backend/api/tests.py:49:81: E502 the backslash is redundant between brackets
interface/backend/api/serializers.py:74:1: E302 expected 2 blank lines, found 1
interface/backend/api/serializers.py:120:17: E127 continuation line over-indented for visual indent
interface/backend/api/views.py:24:1: F811 redefinition of unused 'Response' from line 13
interface/backend/api/views.py:47:37: W291 trailing whitespace
The command "flake8 interface" exited with 1.

$ pycodestyle interface
interface/backend/api/serializers.py:74:1: E302 expected 2 blank lines, found 1
interface/backend/api/serializers.py:120:17: E127 continuation line over-indented for visual indent
interface/backend/api/tests.py:44:1: W293 blank line contains whitespace
interface/backend/api/tests.py:48:57: E502 the backslash is redundant between brackets
interface/backend/api/tests.py:49:81: E502 the backslash is redundant between brackets
interface/backend/api/views.py:47:37: W291 trailing whitespace
The command "pycodestyle interface" exited with 1.

@dchansen
Copy link
Contributor Author

But the pull request doesn't include those files??

@isms
Copy link
Contributor

isms commented Sep 28, 2017

@reubano @dchansen Those issues were introduced by recent commits unrelated to this PR. I pushed fixes which should get the code style back to passing. Will restart the build on this branch to update the status.

(cc @lamby so we can avoid in the future!)

@reubano
Copy link
Contributor

reubano commented Sep 28, 2017

Ok... master looks good. You mind rebasing @dchansen to ensure everything is good to go?

@reubano reubano merged commit 45db7be into drivendataorg:master Sep 29, 2017
@reubano
Copy link
Contributor

reubano commented Sep 29, 2017

closes #4 :shipit:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants