Replies: 3 comments 14 replies
-
Hi, I started with nnUNet for my segmentation task and moved on trying to replicate it with other libraries, such as MONAI, pytorch lightning, etc. Of course, I extracted the all useful information from nnUNet, such as patch size for my dataset, fixed parameters which are independent from any dataset.. And I tried to replicate the nnUNet dataloader, too. sampling, deal with class imbalance, and data augmentation etc... So, regarding your interesting questions,
In general, I agree that nnU-Net setup baseline.. for any arbitrary dataset, so, it is good to go with this for researchers and compare the performance with it. And it is still very good.. in my case. baseline, but which is hard to beat... in terms of performance. |
Beta Was this translation helpful? Give feedback.
-
Hey @SimJeg and @Joeycho , So why is that the case? Well your guess is pretty much as good as mine. All I can provide as an answer is a set of hypotheses that I have so far not been able to prove or disprove. Most of them are just gut feelings of mine. So make of them what you will Hypothesis 1: Things other than the network architecture matter more in medical imaging
There are probably a bunch more that I forgot by now. The essence of this is: There is a lot more engineering involved in doing medical image segmentation 'right' and all of these engineering steps come with their own set of design choices and possibilities to screw things up. This is why you see so many bad UNet baselines in the literature. Remember: If it ain't nnU-net do not trust the 'UNet' baseline Hypothesis 2: Segmentation problems in the medical domain are different At this point I should note that there are of course some datasets that are not well solved. Task10 of the medical segmentation decathlon for example (colon cancer). And there are some datasets that have closely related labels (Kits 21: tumors vs cysts) that are difficult to get right just from the images, also for humans! Which brings me to my next hypothesis Hypothesis 3: Small dataset sizes are disadvantageous for complex architectures So overall, taken these things together I think that the benefits that a good architecture can provide (and I am certain that better architectures DO produce better results) are drowned in a sea of other factors like noise, small dataset sizes and others making it really hard to measure them. Saturated as medical image segmentation is, you really gotta try hard to create an architecture more powerful than the U-Net and to prove its value (at least if you want to convince me). So it is impossible to beat the U-Net? Absolutely not! If all the stars are aligned and you have a large, high quality dataset with sufficiently difficult target structures you can make it work. See for example our AMOS2022 winning contribution where we could clearly see an improvement when switching to a residual encoder :-) This is just a situation that is not given for most datasets. Regarding some of the other discussion points: Why do we still not have general purpose pretrained encoders?
Not hypotheses:
I think unsupervised/self supervised pretraining is highly promising and will give fantastic results in the years to come
Not really I hope. nnU-net has the ability to catalyze progress because you can use it to verify new methodologies. You can drop in your architecture and test it in an environment where everything else is taken care of. You can do really comprehensive analyses in mutiple datasets with minimal effort, for example for evaluating things like new loss functions. Quite neat really and lots of people use it for that! There are also a lot of nnU-Net independent works in segmentation (like in MONAI). Especially MONAI has a lot of traction due to the professional developers working on it (as opposed to random dudes like me doing nnU-Net). But as long as nnU-net dominates the competitions people will come flock to us ;-) Phew. Enough rambling for today. I hope these somewhat incoherent thoughts contain the answers you were looking for! Best, |
Beta Was this translation helpful? Give feedback.
-
Hi @FabianIsensee , Hi @Joeycho , I really appreciate the time you both took to write detailed answers ! Hypothesis 2: maybe indeed several segmentation tasks are to some extent "solved" (e.g. lung segmentation) and that progress should be measured on harder tasks. Such tasks could include lesion detection which is one the main use-case for deep learning in medical imaging. Similarly to my first post, it is very weird for me that the go-to method in medical imaging is still a 2 steps pipeline : segment and reduce the false positives. On natural images, it's been at least 5 years and Mask-RCNN that both steps are done simultaneously (see this leaderboard). RetinaU-Net (and now nnDetection) are paving the way ! Hypothesis 1 : probably at the end the bitter lesson will apply. More compute (e.g. a 500GB GPU ⚡🏭) will make engineering much easier (no patch anymore). I also think that self-supervised (and if possible vision-language models like CLIP) could be a game changer. It proved to work in 2D for X-ray and computational pathology (e.g. table 3 in this paper) I worked on), but I have not seen anything convincing so far in 3D (did you ?). I don't agree that 3D medical datasets are too diverse and we should have 1 model per modality / resolution : I would bet that the distribution of all MRI / CT / PET at any resolution out there is much less diverse than the one of natural images (e.g. YouTube or LAION 5B). CLIP ViT-H achieved remarkable success in any downstream task using a single training set, similar results could apply in medical imaging. The matter is finding an organization with both the time for research and the money for compute, data and talent to train such models 😅
Yes it did ! I will let the discussion opened however, |
Beta Was this translation helpful? Give feedback.
-
Hello,
The U-Net paper came out in 2015, quickly followed in 2016 by V-Net to handle 3D inputs. Seven years later these models culminated in this amazing repository which systematically sets the state of the art in medical image segmentation challenges.
In parallel, the progress of segmentation on "natural" 2D images never really stopped, see for instance the paperswithcode benchmarks ADE20k or Cityscapes.
Improvements came from both :
Why such a discrepancy ?
Thanks for your inputs,
Simon
Beta Was this translation helpful? Give feedback.
All reactions