Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Feature: Adapt the grt123 model #4

Closed
1 task
reubano opened this issue Aug 1, 2017 · 10 comments
Closed
1 task

Feature: Adapt the grt123 model #4

reubano opened this issue Aug 1, 2017 · 10 comments

Comments

@reubano
Copy link
Contributor

reubano commented Aug 1, 2017

Overview

Currently, there is a just a placeholder in the algorithm that classifies nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl (DSB) algorithms to predict P(cancer) given an iterator of nodule centroids for an image.

The top DSB algorithm (grt123) was written to run on a GPU for Python2. It would be nice to integrate this algorithm into the current structure and update it to run on Python3 (potentially on a CPU as well).

Expected Behavior

Given the grt123 model trained to perform this task, a DICOM image, and an iterator of nodule centroids, yield the P(cancer) for each nodule.

Design doc reference: Detect and select

Technical details

  • The majority of the Python3 and CPU conversion has been completed and is available in the conversion branch.

  • The forked model is available here (reads source DICOM images from S3).

  • One area that definitely needs review is the Py2/Py3 floor/true division conversions. Some calculation explicitly converted numbers to floats, and in those cases it was apparent that true division was desired. However, the remaining floor division calculations should be checked to ensure that true is not the appropriate operation.

  • If you get a UnicodeDecodeError error while trying to load the serialized Torch model, use the torch_loader function in the utils module instead.

  • When running on the CPU, it isn't necessary to perform that much work. Just enough to obtain a plausible result in a reasonable amount of time.

  • This feature should be implemented in the prediction/classify/trained_model/predict method.

Acceptance criteria

NOTE: All PRs must follow the standard PR checklist.

@reubano reubano added this to the 1-mvp milestone Aug 1, 2017
@reubano reubano changed the title Feature: Integrate the algos-grt123 branch Feature: Adapt the grt123 model Aug 4, 2017
@Serhiy-Shekhovtsov
Copy link
Contributor

I'd like to work on this one.

P.S. btw, is it a good idea to tell when you are working on something?

@Serhiy-Shekhovtsov
Copy link
Contributor

Hi @reubano! I've fixed few issues on pre-processing part and managed to have it running successfully and giving the same result as original code. These changes have been pushed now.
Now I am looking into detection part, but encountered this issue:

Torch: not enough memory: you tried to allocate 21GB

Does it really take this much memory or is it an issue with environment \ code?

@reubano
Copy link
Contributor Author

reubano commented Sep 18, 2017

@Serhiy-Shekhovtsov

Hi @reubano! I've fixed few issues on pre-processing part and managed to have it running successfully and giving the same result as original code.

That's great! Nice job!

Does it really take this much memory or is it an issue with environment \ code?

I'm afraid i don't know the answer to that one. Do you get the same error when running the base repo?

@dchansen
Copy link
Contributor

I have the full model adapted for Python3 in the pull request at #122

The Nodule identification part is extremely memory hungry. Be sure to enable the volatile setting on the Variables, and reduce the chunk size in the SplitComb.

@Serhiy-Shekhovtsov
Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 19, 2017

Do you get the same error when running the base repo?

Yes.

I have the full model adapted for Python3 in the pull request at #122

Great!

@reubano
Copy link
Contributor Author

reubano commented Sep 20, 2017

@Serhiy-Shekhovtsov did @dchansen recommendations help you out?

@Serhiy-Shekhovtsov
Copy link
Contributor

I didn't check it yet.

@Serhiy-Shekhovtsov
Copy link
Contributor

@dchansen I am getting this error when trying to checkout your repo:

$ git-lfs.exe smudge -- prediction/src/algorithms/classify/assets/gtr123_model.ckpt
Error downloading object: prediction/src/algorithms/classify/assets/gtr123_model.ckpt (eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4)

Smudge error: Error downloading prediction/src/algorithms/classify/assets/gtr123_model.ckpt (eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4): [eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4] Object does not exist on the server: [404] Object does not exist on the server

Any ideas what is it and how to solve it?

@dchansen
Copy link
Contributor

It was an error on my side. It's fixed in the repo now.

@Serhiy-Shekhovtsov
Copy link
Contributor

Thanks, now I can pull it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants