-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add single cell sample images module (#10)
* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <[email protected]> * PR fixes * module docstrings Co-authored-by: Dave Bunten <[email protected]> * create single cell images module * rename_module * finish module * remove sample images from PR * Co-authored-by: Jenna Tomkinson <[email protected]> * documentation * documentation * dave suggestions * Update utils/single_cell_utils.py Co-authored-by: Dave Bunten <[email protected]> --------- Co-authored-by: Dave Bunten <[email protected]>
- Loading branch information
1 parent
051553e
commit bbb9f88
Showing
7 changed files
with
768 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# 6. Single-Cell Images | ||
|
||
In this module, we use the model on single-cell images to clearly demonstrate its application. | ||
|
||
## Single-Cell Sample Image Dataset | ||
|
||
The [single-cell sample image data](mitocheck_single_cell_sample_images) have kindly been provided by Dr. Thomas Walter of the MitoCheck consortium. | ||
This dataset contains sample single-cell images in the following format: | ||
|
||
``` | ||
mitocheck_single_cell_sample_images | ||
│ | ||
└───phenotypic_class | ||
│ │ | ||
│ └───sample_image_path.png | ||
``` | ||
|
||
Because the features for these cells have already been extracted in [`mitocheck_data`](https://github.com/WayScience/mitocheck_data), we do not re-extract features from these images in this module. | ||
Instead, features are associated with a single-cell image based on the cell's location metadata (plate, well, frame, x, y). | ||
|
||
## Top 5 Performing Classes | ||
|
||
In [correct_15_images.ipynb](correct_15_images.ipynb), we show 15 sample single-cell images that the final model from [2.train_model](../2.train_model/) correctly classifies. | ||
Three single-cell images from each of the 5 top performing classes (as determined by F1 score from [compiled_F1_scores.tsv](../3.evaluate_model/evaluations/F1_scores/compiled_F1_scores.tsv)) are displayed and their paths are saved in [top_5_performing_classes.tsv](../6.single_cell_images/sample_image_paths/top_5_performing_classes.tsv). | ||
|
||
## Step 1: Extract Sample Image Data | ||
|
||
Use the commands below to run the Jupyter notebooks and extract the sample image data: | ||
|
||
```sh | ||
# Make sure you are located in 6.single_cell_images | ||
cd 6.single_cell_images | ||
|
||
# Activate phenotypic_profiling conda environment | ||
conda activate phenotypic_profiling | ||
|
||
# Interpret model | ||
bash single_cell_images.sh |
Large diffs are not rendered by default.
Oops, something went wrong.
16 changes: 16 additions & 0 deletions
16
6.single_cell_images/sample_image_paths/top_5_performing_classes.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
Phenotypic_Class Correctly_Labled_Image_Path | ||
0 Elongated mitocheck_single_cell_sample_images/Elongated/PLLT0048_13--ex2005_06_24--sp2005_05_23--tt19--c3___P00069_01___T00013___X0660___Y0691____img.png | ||
1 Elongated mitocheck_single_cell_sample_images/Elongated/PLLT0138_03--ex2005_10_19--sp2005_10_04--tt17--c4___P00127_01___T00043___X1138___Y0920____img.png | ||
2 Elongated mitocheck_single_cell_sample_images/Elongated/PLLT0030_18--ex2007_02_07--sp2005_05_02--tt17--c4___P00034_01___T00061___X0733___Y0675____img.png | ||
3 Grape mitocheck_single_cell_sample_images/Grape/PLLT0157_04--ex2006_01_20--sp2005_10_27--tt17--c5___P00005_01___T00060___X0373___Y0912____img.png | ||
4 Grape mitocheck_single_cell_sample_images/Grape/PLLT0066_19--ex2005_07_22--sp2005_06_07--tt173--c5___P00287_01___T00086___X0953___Y0186____img.png | ||
5 Grape mitocheck_single_cell_sample_images/Grape/PLLT0066_19--ex2005_07_22--sp2005_06_07--tt173--c5___P00287_01___T00086___X1105___Y0313____img.png | ||
6 Large mitocheck_single_cell_sample_images/Large/PLLT0043_48--ex2005_06_29--sp2005_05_19--tt163--c4___P00166_01___T00070___X1021___Y0160____img.png | ||
7 Large mitocheck_single_cell_sample_images/Large/PLLT0043_48--ex2005_06_29--sp2005_05_19--tt163--c4___P00166_01___T00070___X1122___Y0143____img.png | ||
8 Large mitocheck_single_cell_sample_images/Large/PLLT0013_38--ex2005_05_06--sp2005_04_11--tt163--c3___P00042_01___T00094___X0851___Y0101____img.png | ||
9 OutOfFocus mitocheck_single_cell_sample_images/OutOfFocus/PLLT0027_45--ex2005_06_01--sp2005_04_27--tt17--c5___P00048_01___T00012___X0738___Y0874____img.png | ||
10 OutOfFocus mitocheck_single_cell_sample_images/OutOfFocus/PLLT0027_45--ex2005_06_01--sp2005_04_27--tt17--c5___P00084_01___T00012___X0777___Y0702____img.png | ||
11 OutOfFocus mitocheck_single_cell_sample_images/OutOfFocus/PLLT0029_05--ex2005_06_08--sp2005_01_01--tt17--c3___P00060_01___T00032___X0752___Y0607____img.png | ||
12 Polylobed mitocheck_single_cell_sample_images/Polylobed/PLLT0027_44--ex2005_06_03--sp2005_04_27--tt17--c5___P00030_01___T00085___X0733___Y0690____img.png | ||
13 Polylobed mitocheck_single_cell_sample_images/Polylobed/PLLT0046_19--ex2005_06_08--sp2005_01_01--tt17--c4___P00356_01___T00056___X0284___Y0282____img.png | ||
14 Polylobed mitocheck_single_cell_sample_images/Polylobed/PLLT0084_46--ex2005_08_03--sp2005_07_07--tt17--c4___P00003_01___T00090___X1053___Y0534____img.png |
88 changes: 88 additions & 0 deletions
88
6.single_cell_images/scripts/nbconverted/correct_15_images.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
#!/usr/bin/env python | ||
# coding: utf-8 | ||
|
||
# ### Import Libraries | ||
|
||
# In[1]: | ||
|
||
|
||
import pathlib | ||
import pandas as pd | ||
from joblib import load | ||
|
||
import sys | ||
sys.path.append("../utils") | ||
from split_utils import get_features_data | ||
from train_utils import get_dataset | ||
from single_cell_utils import get_15_correct_sample_images, plot_15_correct_sample_images | ||
|
||
|
||
# ### Load Model and Test Dataset | ||
|
||
# In[2]: | ||
|
||
|
||
# load final logistic regression model | ||
model_dir = pathlib.Path("../2.train_model/models/") | ||
log_reg_model_path = pathlib.Path(f"{model_dir}/log_reg_model.joblib") | ||
log_reg_model = load(log_reg_model_path) | ||
|
||
# load test data | ||
data_split_path = pathlib.Path("../1.split_data/indexes/data_split_indexes.tsv") | ||
data_split_indexes = pd.read_csv(data_split_path, sep="\t", index_col=0) | ||
features_dataframe_path = pathlib.Path("../0.download_data/data/training_data.csv.gz") | ||
features_dataframe = get_features_data(features_dataframe_path) | ||
test_data = get_dataset(features_dataframe, data_split_indexes, "test") | ||
|
||
|
||
# ### Get 5 best performing classes (final model, test dataset) | ||
|
||
# In[3]: | ||
|
||
|
||
# load compiled f1 scores | ||
compiled_f1_scores_path = pathlib.Path("../3.evaluate_model/evaluations/F1_scores/compiled_F1_scores.tsv") | ||
compiled_f1_scores = pd.read_csv(compiled_f1_scores_path, sep="\t", index_col=0) | ||
|
||
# only get f1 score data for final model on test data | ||
final_model_test_f1_scores = compiled_f1_scores.loc[(compiled_f1_scores['shuffled'] == False) & (compiled_f1_scores['data_split'] == "test")] | ||
# sort the F1 score data highest to lowest | ||
final_model_test_f1_scores = final_model_test_f1_scores.sort_values(["F1_Score"], ascending=False) | ||
# only use top 5 performing phenotypes | ||
final_model_test_f1_scores = final_model_test_f1_scores.head(5) | ||
# preview phenotypic classes and F1 scores | ||
final_model_test_f1_scores | ||
|
||
|
||
# ### Get 15 correct sample images (3 from each of the 5 top performing classes) and save their paths | ||
|
||
# In[4]: | ||
|
||
|
||
single_cell_images_dir_path = pathlib.Path("mitocheck_single_cell_sample_images/") | ||
phenotypic_classes = final_model_test_f1_scores["Phenotypic_Class"].tolist() | ||
correct_15_images = get_15_correct_sample_images( | ||
phenotypic_classes, test_data, log_reg_model, single_cell_images_dir_path | ||
) | ||
|
||
# save paths of 15 images in tidy format | ||
# use melt to convert pandas dataframe to tidy long format and drop/rename to achieve desired format | ||
tidy_15_images = pd.melt(correct_15_images, ["Phenotypic_Class"], ignore_index=True).sort_values(["Phenotypic_Class"]).reset_index(drop=True) | ||
tidy_15_images = tidy_15_images.drop(["variable"], axis=1) | ||
tidy_15_images = tidy_15_images.rename(columns={"value": "Correctly_Labled_Image_Path"}) | ||
|
||
tidy_15_images_save_path = pathlib.Path("sample_image_paths/top_5_performing_classes.tsv") | ||
tidy_15_images_save_path.parents[0].mkdir(parents=True, exist_ok=True) | ||
tidy_15_images.to_csv(tidy_15_images_save_path, sep="\t") | ||
|
||
# show the 15 image paths that are being used | ||
correct_15_images | ||
|
||
|
||
# ### Show 3 examples of correct predictions for each of top 5 performing phenotypic classes | ||
|
||
# In[5]: | ||
|
||
|
||
plot_15_correct_sample_images(correct_15_images) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#!/bin/bash | ||
# Convert notebook to python file and execute | ||
jupyter nbconvert --to python \ | ||
--output-dir=scripts/nbconverted \ | ||
--execute correct_15_images.ipynb |
Oops, something went wrong.