-
Notifications
You must be signed in to change notification settings - Fork 147
Feature: Implement segmentation algorithm #3
Comments
Hi. I'd like to work on this issue. Might I ask what level of accuracy we should be working towards for this first pass? I see that you don't yet require a highly accurate model, but the acceptance criteria mentions that it should be trained. To what extent are you looking for a "stand in" versus the first step towards a model that might appear in the finished product? Similarly, if I work on this issue, should I submit a pull request only when I have a trained segmentation model up and running and all the pipelines built, or should one submit several, incremental requests? Thanks! |
Hi @dssa56 thanks for your questions!
You shouldn't be too worried about accuracy at this point. We have referenced the top 10 implementations from the original Kaggle competition as starting points (#18 - #28). And any one of those should be accurate enough for the MVP. I would say, just make sure the integrated solution isn't significantly less accurate than its original version.
This will be "stand in" in the sense that the accepted PR may not be the most efficient or accurate option out of the possible solutions. However, it should still be fully functional, and complete its stated goals reasonably well. To echo above, if you use any of the top 10 models as a base, you won't have anything to worry about.
Feel free to submit a PR as soon as you are ready for us to review it. GitHub lets you add commits to a PR, so you can make improvements or even restructure your code if necessary. |
Hi reubano, thanks for your answer. Now I understand much better what we're shooting for at this point. |
Hi again. Just a quick question about image scales and evaluation. In the LIDC dataset, all the nodules have a size of O(10x10) px per 2d slice. Since this is the only 'official' dataset available at the moment, I'm thinking of training on it exclusively for now. Given that the resulting model probably won't perform well on higher-res images (although one can always downsample, I suppose), I'm wondering whether you'll accept good performance on LIDC for a MVP submission, or should I be leveraging other datasets and trying to train a model that can perform well on high-res images without downsampling? Thanks. |
@dssa56 Yes, good performance on LIDC is fine for an MVP submission. Thanks for the question. |
Overview
Currently, there is a just a placeholder in the algorithm that segments nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl algorithms to predict nodule boundaries and descriptive statistics from an iterator of nodule centroids for an image.
Expected Behavior
Given a model trained to perform this task, a DICOM image, and an iterator of nodule centroids, save a file with boundaries (3D boolean mask with true values for voxels associated with that nodule), widest width, and volume to disk. Yield paths to the saved file for each nodule.
Design doc reference:
Jobs to be done > Segment > Prediction service
Technical details
prediction/src/algorithms/segment/trained_model/predict
method.prediction/src/algorithms/segment/src/
folderprediction/src/algorithms/segment/assets/
folder usinggit-lfs
Out of scope
This feature is a first-pass at getting a model that completes the task with the defined input and output. We are not yet judging the model based on its accuracy or computational performance.
Acceptance criteria
NOTE: All PRs must follow the standard PR checklist.
The text was updated successfully, but these errors were encountered: