Issue 237 - Training with static set of single-cell crops #238

jccaicedo · 2020-08-19T23:23:01Z

A new training dynamic based on statically selected single-cell crops. This requires to create the sample of single cells and using a crop generator that can read them. This PR implements both, and preliminary results indicate it's working well.

The way to use the sample selection command is as follows:

python deepprofiler --root=/path/to/project/ --config=config.json sample-sc

It generates a dataset of single cell images in the directory $root/outputs/single-cell-sample/ together with an index file which contains labels for weakly supervised learning. The sample is created using the regular crop_generator for training, but without data augmentations. This ensures three things:

The crops are created following the configuration in the config.json file.
The selection of cells follows the same data balancing used during regular training.
The sample is efficiently created as it happens in the GPUs

The cell crops are stored as PNG images with unrolled channels in the horizontal axis (example below). Only one dataset of single cells is kept in the output directories. When the command is run again, it removes the previous sample.

In addition to the sample of single cells, a crop generator has been implemented, the sampled_crop_generator, which can be used to read these samples. With this generator, the training algorithm goes through the list of cells creating batches in order until all cells have been used. The list is reshuffled and traversed all over again. The results indicate that this procedure works well, is faster and has the potential to yield even better results. At the moment, no better results are observed, but the problem is the metric, which I will discuss in another issue.

jccaicedo · 2020-08-20T15:58:11Z

Preliminary results on BBBC021.

Using a ResNet50 model, with 128x128 crops. Weakly supervised learning of treatments (compound+concentration), a total of 104 classes, including DMSO.

Training Method	Training Performance	Validation Performance
Dynamic cropping	0.9011	0.2304
Static crop sample (this PR)	0.8530	0.2753

The performance metric is Average Class Accuracy, which DeepProfiler does not report by default. However, this metric removes the bias of classes with larger number of images (class imbalance) and measures the performance on all treatments equally.

The dynamic cropping displays more overfitting to the training set. This was not clearly observed before because we used the biased generic accuracy metric, which is sensitive to class imbalance. In BBBC021 (as in many other datasets), negative controls have a large class imbalance, which means that classifying them correctly gives higher accuracy. This is the case with dynamic cropping, it favors negative controls, potentially because there are many more examples that can be dynamically obtained for training, obtaining high accuracy in the validation set. However, it does poorly in other classes.

The static cropping methodology breaks the imbalance more consistently and does not get negative controls as accurately, but can recognize other phenotypes with higher accuracy than the dynamic cropping learning. This can be improved even further, but the results show that we have potential for better results as well as a need to change our performance metrics for training.

jccaicedo · 2020-08-20T16:21:58Z

Another benefit of static cropping: training is completed approximately 30% faster. This comes with the cost of additional storage space for single cells, but the benefits in speed and accuracy are worth it.

jccaicedo · 2020-08-20T16:25:20Z

Just to complete the report of the experiment here, the performance of dynamic cropping and static cropping was recorded in comet.ml and can be compared here. Note that the performance metrics are based on accuracy, which is biased. Other metrics will be considered in the future as discussed in issue #239 .

jccaicedo · 2020-08-24T14:39:32Z

I need to revert the Control Normalization commit from this branch. I'll update this PR when this is done and ready for merging.

This reverts commit 267734d.

jccaicedo added 10 commits August 6, 2020 16:43

Sampling single cells

c380789

Expanding sampler. Not functional yet.

3c5182d

Working cell cropping

daf44d7

Saving cropped cells

2709dc3

Sampling single cells done

65124fd

Crop generator prototype with sampled crops

917d0a1

Integrated augmentations

686b14d

Added AugmentationLayer to ResNet50

d1a7367

Fixed label order in the training set

f323a1e

Various fixes and calibrations

da0a993

jccaicedo requested a review from Arkkienkeli August 19, 2020 23:23

jccaicedo changed the title ~~Issue 237~~ Issue 237 - Training with static set of single-cell crops Aug 19, 2020

jccaicedo mentioned this pull request Aug 19, 2020

Implement a sample single cells command #237

Closed

jccaicedo mentioned this pull request Aug 20, 2020

Performance metrics - report average class accuracy, rather than just accuracy. #239

Open

Minor style changes

d8f33ea

Implemented Control Normalization

267734d

Revert "Implemented Control Normalization"

4c50708

This reverts commit 267734d.

Arkkienkeli merged commit 4967041 into master Aug 27, 2020

Arkkienkeli deleted the issue-237 branch November 4, 2020 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 237 - Training with static set of single-cell crops #238

Issue 237 - Training with static set of single-cell crops #238

jccaicedo commented Aug 19, 2020

jccaicedo commented Aug 20, 2020

jccaicedo commented Aug 20, 2020

jccaicedo commented Aug 20, 2020

jccaicedo commented Aug 24, 2020

Issue 237 - Training with static set of single-cell crops #238

Issue 237 - Training with static set of single-cell crops #238

Conversation

jccaicedo commented Aug 19, 2020

jccaicedo commented Aug 20, 2020

jccaicedo commented Aug 20, 2020

jccaicedo commented Aug 20, 2020

jccaicedo commented Aug 24, 2020