-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 237 - Training with static set of single-cell crops #238
Conversation
Preliminary results on BBBC021. Using a ResNet50 model, with 128x128 crops. Weakly supervised learning of treatments (compound+concentration), a total of 104 classes, including DMSO.
The performance metric is Average Class Accuracy, which DeepProfiler does not report by default. However, this metric removes the bias of classes with larger number of images (class imbalance) and measures the performance on all treatments equally. The dynamic cropping displays more overfitting to the training set. This was not clearly observed before because we used the biased generic accuracy metric, which is sensitive to class imbalance. In BBBC021 (as in many other datasets), negative controls have a large class imbalance, which means that classifying them correctly gives higher accuracy. This is the case with dynamic cropping, it favors negative controls, potentially because there are many more examples that can be dynamically obtained for training, obtaining high accuracy in the validation set. However, it does poorly in other classes. The static cropping methodology breaks the imbalance more consistently and does not get negative controls as accurately, but can recognize other phenotypes with higher accuracy than the dynamic cropping learning. This can be improved even further, but the results show that we have potential for better results as well as a need to change our performance metrics for training. |
Another benefit of static cropping: training is completed approximately 30% faster. This comes with the cost of additional storage space for single cells, but the benefits in speed and accuracy are worth it. |
Just to complete the report of the experiment here, the performance of dynamic cropping and static cropping was recorded in comet.ml and can be compared here. Note that the performance metrics are based on accuracy, which is biased. Other metrics will be considered in the future as discussed in issue #239 . |
I need to revert the Control Normalization commit from this branch. I'll update this PR when this is done and ready for merging. |
This reverts commit 267734d.
A new training dynamic based on statically selected single-cell crops. This requires to create the sample of single cells and using a crop generator that can read them. This PR implements both, and preliminary results indicate it's working well.
The way to use the sample selection command is as follows:
python deepprofiler --root=/path/to/project/ --config=config.json sample-sc
It generates a dataset of single cell images in the directory
$root/outputs/single-cell-sample/
together with an index file which contains labels for weakly supervised learning. The sample is created using the regularcrop_generator
for training, but without data augmentations. This ensures three things:The cell crops are stored as PNG images with unrolled channels in the horizontal axis (example below). Only one dataset of single cells is kept in the output directories. When the command is run again, it removes the previous sample.
In addition to the sample of single cells, a crop generator has been implemented, the
sampled_crop_generator
, which can be used to read these samples. With this generator, the training algorithm goes through the list of cells creating batches in order until all cells have been used. The list is reshuffled and traversed all over again. The results indicate that this procedure works well, is faster and has the potential to yield even better results. At the moment, no better results are observed, but the problem is the metric, which I will discuss in another issue.