Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 237 - Training with static set of single-cell crops #238

Merged
merged 13 commits into from
Aug 27, 2020

Conversation

jccaicedo
Copy link
Member

A new training dynamic based on statically selected single-cell crops. This requires to create the sample of single cells and using a crop generator that can read them. This PR implements both, and preliminary results indicate it's working well.

The way to use the sample selection command is as follows:

python deepprofiler --root=/path/to/project/ --config=config.json sample-sc

It generates a dataset of single cell images in the directory $root/outputs/single-cell-sample/ together with an index file which contains labels for weakly supervised learning. The sample is created using the regular crop_generator for training, but without data augmentations. This ensures three things:

  1. The crops are created following the configuration in the config.json file.
  2. The selection of cells follows the same data balancing used during regular training.
  3. The sample is efficiently created as it happens in the GPUs

The cell crops are stored as PNG images with unrolled channels in the horizontal axis (example below). Only one dataset of single cells is kept in the output directories. When the command is run again, it removes the previous sample.

Screen Shot 2020-08-19 at 6 56 59 PM

In addition to the sample of single cells, a crop generator has been implemented, the sampled_crop_generator, which can be used to read these samples. With this generator, the training algorithm goes through the list of cells creating batches in order until all cells have been used. The list is reshuffled and traversed all over again. The results indicate that this procedure works well, is faster and has the potential to yield even better results. At the moment, no better results are observed, but the problem is the metric, which I will discuss in another issue.

@jccaicedo jccaicedo requested a review from Arkkienkeli August 19, 2020 23:23
@jccaicedo jccaicedo changed the title Issue 237 Issue 237 - Training with static set of single-cell crops Aug 19, 2020
@jccaicedo
Copy link
Member Author

Preliminary results on BBBC021.

Using a ResNet50 model, with 128x128 crops. Weakly supervised learning of treatments (compound+concentration), a total of 104 classes, including DMSO.

Training Method Training Performance Validation Performance
Dynamic cropping 0.9011 0.2304
Static crop sample (this PR) 0.8530 0.2753

The performance metric is Average Class Accuracy, which DeepProfiler does not report by default. However, this metric removes the bias of classes with larger number of images (class imbalance) and measures the performance on all treatments equally.

The dynamic cropping displays more overfitting to the training set. This was not clearly observed before because we used the biased generic accuracy metric, which is sensitive to class imbalance. In BBBC021 (as in many other datasets), negative controls have a large class imbalance, which means that classifying them correctly gives higher accuracy. This is the case with dynamic cropping, it favors negative controls, potentially because there are many more examples that can be dynamically obtained for training, obtaining high accuracy in the validation set. However, it does poorly in other classes.

The static cropping methodology breaks the imbalance more consistently and does not get negative controls as accurately, but can recognize other phenotypes with higher accuracy than the dynamic cropping learning. This can be improved even further, but the results show that we have potential for better results as well as a need to change our performance metrics for training.

@jccaicedo
Copy link
Member Author

Another benefit of static cropping: training is completed approximately 30% faster. This comes with the cost of additional storage space for single cells, but the benefits in speed and accuracy are worth it.

@jccaicedo
Copy link
Member Author

Just to complete the report of the experiment here, the performance of dynamic cropping and static cropping was recorded in comet.ml and can be compared here. Note that the performance metrics are based on accuracy, which is biased. Other metrics will be considered in the future as discussed in issue #239 .

@jccaicedo
Copy link
Member Author

I need to revert the Control Normalization commit from this branch. I'll update this PR when this is done and ready for merging.

@Arkkienkeli Arkkienkeli merged commit 4967041 into master Aug 27, 2020
@Arkkienkeli Arkkienkeli deleted the issue-237 branch November 4, 2020 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants