Feature: Implement segmentation algorithm #3

reubano · 2017-08-01T12:21:22Z

Overview

Currently, there is a just a placeholder in the algorithm that segments nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl algorithms to predict nodule boundaries and descriptive statistics from an iterator of nodule centroids for an image.

Expected Behavior

Given a model trained to perform this task, a DICOM image, and an iterator of nodule centroids, save a file with boundaries (3D boolean mask with true values for voxels associated with that nodule), widest width, and volume to disk. Yield paths to the saved file for each nodule.

Design doc reference:
Jobs to be done > Segment > Prediction service

Technical details

This feature should be implemented in the prediction/src/algorithms/segment/trained_model/predict method.
Code to train the model should live in the prediction/src/algorithms/segment/src/ folder
A fully serialized version of the model that can be loaded should live in the prediction/src/algorithms/segment/assets/ folder using git-lfs

Out of scope

This feature is a first-pass at getting a model that completes the task with the defined input and output. We are not yet judging the model based on its accuracy or computational performance.

Acceptance criteria

trained model
documentation for the trained model (e.g., cross validation performance, data used) and how to re-train it

NOTE: All PRs must follow the standard PR checklist.

The text was updated successfully, but these errors were encountered:

ghost · 2017-08-25T20:21:43Z

Hi. I'd like to work on this issue. Might I ask what level of accuracy we should be working towards for this first pass? I see that you don't yet require a highly accurate model, but the acceptance criteria mentions that it should be trained. To what extent are you looking for a "stand in" versus the first step towards a model that might appear in the finished product? Similarly, if I work on this issue, should I submit a pull request only when I have a trained segmentation model up and running and all the pipelines built, or should one submit several, incremental requests? Thanks!

reubano · 2017-08-28T12:13:10Z

Hi @dssa56 thanks for your questions!

Might I ask what level of accuracy we should be working towards for this first pass? I see that you don't yet require a highly accurate model, but the acceptance criteria mentions that it should be trained.

You shouldn't be too worried about accuracy at this point. We have referenced the top 10 implementations from the original Kaggle competition as starting points (#18 - #28). And any one of those should be accurate enough for the MVP. I would say, just make sure the integrated solution isn't significantly less accurate than its original version.

To what extent are you looking for a "stand in" versus the first step towards a model that might appear in the finished product?

This will be "stand in" in the sense that the accepted PR may not be the most efficient or accurate option out of the possible solutions. However, it should still be fully functional, and complete its stated goals reasonably well. To echo above, if you use any of the top 10 models as a base, you won't have anything to worry about.

Similarly, if I work on this issue, should I submit a pull request only when I have a trained segmentation model up and running and all the pipelines built, or should one submit several, incremental requests?

Feel free to submit a PR as soon as you are ready for us to review it. GitHub lets you add commits to a PR, so you can make improvements or even restructure your code if necessary.

ghost · 2017-08-28T14:23:38Z

Hi reubano, thanks for your answer. Now I understand much better what we're shooting for at this point.

ghost · 2017-09-01T19:56:00Z

Hi again. Just a quick question about image scales and evaluation. In the LIDC dataset, all the nodules have a size of O(10x10) px per 2d slice. Since this is the only 'official' dataset available at the moment, I'm thinking of training on it exclusively for now. Given that the resulting model probably won't perform well on higher-res images (although one can always downsample, I suppose), I'm wondering whether you'll accept good performance on LIDC for a MVP submission, or should I be leveraging other datasets and trying to train a model that can perform well on high-res images without downsampling? Thanks.

isms · 2017-09-02T12:07:34Z

I'm wondering whether you'll accept good performance on LIDC for a MVP submission, or should I be leveraging other datasets and trying to train a model that can perform well on high-res images without downsampling?

@dssa56 Yes, good performance on LIDC is fine for an MVP submission. Thanks for the question.

reubano added this to the 1-mvp milestone Aug 1, 2017

reubano added feature major official prediction labels Aug 1, 2017

reiinakano mentioned this issue Aug 22, 2017

Feature: Implement classification algorithm #2

Closed

2 tasks

This was referenced Sep 18, 2017

Lung Segmentation #120

Closed

Data Generator #132

Merged

reubano mentioned this issue Sep 26, 2017

3D data augmentation #137

Closed

1 task

dchansen mentioned this issue Sep 29, 2017

Implemented simple nodule segmentation algorithm #142

Closed

1 task

This was referenced Sep 30, 2017

Continuous improvement of lungs segmentation algorithm #138

Open

#3 Segmentation Algorithm #147

Merged

Segment nodules: find appropriate training data shape #151

Closed

isms modified the milestones: 1-mvp, 2-feature-building Oct 12, 2017

reubano closed this as completed Oct 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Implement segmentation algorithm #3

Feature: Implement segmentation algorithm #3

reubano commented Aug 1, 2017 •

edited

Loading

ghost commented Aug 25, 2017

reubano commented Aug 28, 2017

ghost commented Aug 28, 2017

ghost commented Sep 1, 2017

isms commented Sep 2, 2017

Feature: Implement segmentation algorithm #3

Feature: Implement segmentation algorithm #3

Comments

reubano commented Aug 1, 2017 • edited Loading

Overview

Expected Behavior

Technical details

Out of scope

Acceptance criteria

ghost commented Aug 25, 2017

reubano commented Aug 28, 2017

ghost commented Aug 28, 2017

ghost commented Sep 1, 2017

isms commented Sep 2, 2017

reubano commented Aug 1, 2017 •

edited

Loading