Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Feature: Implement segmentation algorithm #3

Closed
2 tasks
reubano opened this issue Aug 1, 2017 · 5 comments
Closed
2 tasks

Feature: Implement segmentation algorithm #3

reubano opened this issue Aug 1, 2017 · 5 comments

Comments

@reubano
Copy link
Contributor

reubano commented Aug 1, 2017

Overview

Currently, there is a just a placeholder in the algorithm that segments nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl algorithms to predict nodule boundaries and descriptive statistics from an iterator of nodule centroids for an image.

Expected Behavior

Given a model trained to perform this task, a DICOM image, and an iterator of nodule centroids, save a file with boundaries (3D boolean mask with true values for voxels associated with that nodule), widest width, and volume to disk. Yield paths to the saved file for each nodule.

Design doc reference:
Jobs to be done > Segment > Prediction service

Technical details

Out of scope

This feature is a first-pass at getting a model that completes the task with the defined input and output. We are not yet judging the model based on its accuracy or computational performance.

Acceptance criteria

  • trained model
  • documentation for the trained model (e.g., cross validation performance, data used) and how to re-train it

NOTE: All PRs must follow the standard PR checklist.

@ghost
Copy link

ghost commented Aug 25, 2017

Hi. I'd like to work on this issue. Might I ask what level of accuracy we should be working towards for this first pass? I see that you don't yet require a highly accurate model, but the acceptance criteria mentions that it should be trained. To what extent are you looking for a "stand in" versus the first step towards a model that might appear in the finished product? Similarly, if I work on this issue, should I submit a pull request only when I have a trained segmentation model up and running and all the pipelines built, or should one submit several, incremental requests? Thanks!

@reubano
Copy link
Contributor Author

reubano commented Aug 28, 2017

Hi @dssa56 thanks for your questions!

Might I ask what level of accuracy we should be working towards for this first pass? I see that you don't yet require a highly accurate model, but the acceptance criteria mentions that it should be trained.

You shouldn't be too worried about accuracy at this point. We have referenced the top 10 implementations from the original Kaggle competition as starting points (#18 - #28). And any one of those should be accurate enough for the MVP. I would say, just make sure the integrated solution isn't significantly less accurate than its original version.

To what extent are you looking for a "stand in" versus the first step towards a model that might appear in the finished product?

This will be "stand in" in the sense that the accepted PR may not be the most efficient or accurate option out of the possible solutions. However, it should still be fully functional, and complete its stated goals reasonably well. To echo above, if you use any of the top 10 models as a base, you won't have anything to worry about.

Similarly, if I work on this issue, should I submit a pull request only when I have a trained segmentation model up and running and all the pipelines built, or should one submit several, incremental requests?

Feel free to submit a PR as soon as you are ready for us to review it. GitHub lets you add commits to a PR, so you can make improvements or even restructure your code if necessary.

@ghost
Copy link

ghost commented Aug 28, 2017

Hi reubano, thanks for your answer. Now I understand much better what we're shooting for at this point.

@ghost
Copy link

ghost commented Sep 1, 2017

Hi again. Just a quick question about image scales and evaluation. In the LIDC dataset, all the nodules have a size of O(10x10) px per 2d slice. Since this is the only 'official' dataset available at the moment, I'm thinking of training on it exclusively for now. Given that the resulting model probably won't perform well on higher-res images (although one can always downsample, I suppose), I'm wondering whether you'll accept good performance on LIDC for a MVP submission, or should I be leveraging other datasets and trying to train a model that can perform well on high-res images without downsampling? Thanks.

@isms
Copy link
Contributor

isms commented Sep 2, 2017

I'm wondering whether you'll accept good performance on LIDC for a MVP submission, or should I be leveraging other datasets and trying to train a model that can perform well on high-res images without downsampling?

@dssa56 Yes, good performance on LIDC is fine for an MVP submission. Thanks for the question.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants