Add single cell sample images module (#10)

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <[email protected]> * PR fixes * module docstrings Co-authored-by: Dave Bunten <[email protected]> * create single cell images module * rename_module * finish module * remove sample images from PR * Co-authored-by: Jenna Tomkinson <[email protected]> * documentation * documentation * dave suggestions * Update utils/single_cell_utils.py Co-authored-by: Dave Bunten <[email protected]> --------- Co-authored-by: Dave Bunten <[email protected]>
WayScience · Feb 10, 2023 · bbb9f88 · bbb9f88
1 parent 051553e
commit bbb9f88
Show file tree

Hide file tree

Showing 7 changed files with 768 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,5 @@ __pycache__/
 .vscode
 # autosklearn
 3.ML_model/autosklearn.ipynb
+# single cell sample images
+6.single_cell_images/mitocheck_single_cell_sample_images
diff --git a/6.single_cell_images/README.md b/6.single_cell_images/README.md
@@ -0,0 +1,39 @@
+# 6. Single-Cell Images
+
+In this module, we use the model on single-cell images to clearly demonstrate its application.
+
+## Single-Cell Sample Image Dataset
+
+The [single-cell sample image data](mitocheck_single_cell_sample_images) have kindly been provided by Dr. Thomas Walter of the MitoCheck consortium.
+This dataset contains sample single-cell images in the following format:
+
+```
+mitocheck_single_cell_sample_images
+│
+└───phenotypic_class
+│   │
+│   └───sample_image_path.png
+
+```
+
+Because the features for these cells have already been extracted in [`mitocheck_data`](https://github.com/WayScience/mitocheck_data), we do not re-extract features from these images in this module.
+Instead, features are associated with a single-cell image based on the cell's location metadata (plate, well, frame, x, y).
+
+## Top 5 Performing Classes
+
+In [correct_15_images.ipynb](correct_15_images.ipynb), we show 15 sample single-cell images that the final model from [2.train_model](../2.train_model/) correctly classifies.
+Three single-cell images from each of the 5 top performing classes (as determined by F1 score from [compiled_F1_scores.tsv](../3.evaluate_model/evaluations/F1_scores/compiled_F1_scores.tsv)) are displayed and their paths are saved in [top_5_performing_classes.tsv](../6.single_cell_images/sample_image_paths/top_5_performing_classes.tsv).
+
+## Step 1: Extract Sample Image Data
+
+Use the commands below to run the Jupyter notebooks and extract the sample image data:
+
+```sh
+# Make sure you are located in 6.single_cell_images
+cd 6.single_cell_images
+
+# Activate phenotypic_profiling conda environment
+conda activate phenotypic_profiling
+
+# Interpret model
+bash single_cell_images.sh
diff --git a/6.single_cell_images/correct_15_images.ipynb b/6.single_cell_images/correct_15_images.ipynb
diff --git a/6.single_cell_images/sample_image_paths/top_5_performing_classes.tsv b/6.single_cell_images/sample_image_paths/top_5_performing_classes.tsv
@@ -0,0 +1,16 @@
+	Phenotypic_Class	Correctly_Labled_Image_Path
+0	Elongated	mitocheck_single_cell_sample_images/Elongated/PLLT0048_13--ex2005_06_24--sp2005_05_23--tt19--c3___P00069_01___T00013___X0660___Y0691____img.png
+1	Elongated	mitocheck_single_cell_sample_images/Elongated/PLLT0138_03--ex2005_10_19--sp2005_10_04--tt17--c4___P00127_01___T00043___X1138___Y0920____img.png
+2	Elongated	mitocheck_single_cell_sample_images/Elongated/PLLT0030_18--ex2007_02_07--sp2005_05_02--tt17--c4___P00034_01___T00061___X0733___Y0675____img.png
+3	Grape	mitocheck_single_cell_sample_images/Grape/PLLT0157_04--ex2006_01_20--sp2005_10_27--tt17--c5___P00005_01___T00060___X0373___Y0912____img.png
+4	Grape	mitocheck_single_cell_sample_images/Grape/PLLT0066_19--ex2005_07_22--sp2005_06_07--tt173--c5___P00287_01___T00086___X0953___Y0186____img.png
+5	Grape	mitocheck_single_cell_sample_images/Grape/PLLT0066_19--ex2005_07_22--sp2005_06_07--tt173--c5___P00287_01___T00086___X1105___Y0313____img.png
+6	Large	mitocheck_single_cell_sample_images/Large/PLLT0043_48--ex2005_06_29--sp2005_05_19--tt163--c4___P00166_01___T00070___X1021___Y0160____img.png
+7	Large	mitocheck_single_cell_sample_images/Large/PLLT0043_48--ex2005_06_29--sp2005_05_19--tt163--c4___P00166_01___T00070___X1122___Y0143____img.png
+8	Large	mitocheck_single_cell_sample_images/Large/PLLT0013_38--ex2005_05_06--sp2005_04_11--tt163--c3___P00042_01___T00094___X0851___Y0101____img.png
+9	OutOfFocus	mitocheck_single_cell_sample_images/OutOfFocus/PLLT0027_45--ex2005_06_01--sp2005_04_27--tt17--c5___P00048_01___T00012___X0738___Y0874____img.png
+10	OutOfFocus	mitocheck_single_cell_sample_images/OutOfFocus/PLLT0027_45--ex2005_06_01--sp2005_04_27--tt17--c5___P00084_01___T00012___X0777___Y0702____img.png
+11	OutOfFocus	mitocheck_single_cell_sample_images/OutOfFocus/PLLT0029_05--ex2005_06_08--sp2005_01_01--tt17--c3___P00060_01___T00032___X0752___Y0607____img.png
+12	Polylobed	mitocheck_single_cell_sample_images/Polylobed/PLLT0027_44--ex2005_06_03--sp2005_04_27--tt17--c5___P00030_01___T00085___X0733___Y0690____img.png
+13	Polylobed	mitocheck_single_cell_sample_images/Polylobed/PLLT0046_19--ex2005_06_08--sp2005_01_01--tt17--c4___P00356_01___T00056___X0284___Y0282____img.png
+14	Polylobed	mitocheck_single_cell_sample_images/Polylobed/PLLT0084_46--ex2005_08_03--sp2005_07_07--tt17--c4___P00003_01___T00090___X1053___Y0534____img.png
diff --git a/6.single_cell_images/scripts/nbconverted/correct_15_images.py b/6.single_cell_images/scripts/nbconverted/correct_15_images.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+# ### Import Libraries
+
+# In[1]:
+
+
+import pathlib
+import pandas as pd
+from joblib import load
+
+import sys
+sys.path.append("../utils")
+from split_utils import get_features_data
+from train_utils import get_dataset
+from single_cell_utils import get_15_correct_sample_images, plot_15_correct_sample_images
+
+
+# ### Load Model and Test Dataset
+
+# In[2]:
+
+
+# load final logistic regression model
+model_dir = pathlib.Path("../2.train_model/models/")
+log_reg_model_path = pathlib.Path(f"{model_dir}/log_reg_model.joblib")
+log_reg_model = load(log_reg_model_path)
+
+# load test data
+data_split_path = pathlib.Path("../1.split_data/indexes/data_split_indexes.tsv")
+data_split_indexes = pd.read_csv(data_split_path, sep="\t", index_col=0)
+features_dataframe_path = pathlib.Path("../0.download_data/data/training_data.csv.gz")
+features_dataframe = get_features_data(features_dataframe_path)
+test_data = get_dataset(features_dataframe, data_split_indexes, "test")
+
+
+# ### Get 5 best performing classes (final model, test dataset)
+
+# In[3]:
+
+
+# load compiled f1 scores
+compiled_f1_scores_path = pathlib.Path("../3.evaluate_model/evaluations/F1_scores/compiled_F1_scores.tsv")
+compiled_f1_scores = pd.read_csv(compiled_f1_scores_path, sep="\t", index_col=0)
+
+# only get f1 score data for final model on test data
+final_model_test_f1_scores = compiled_f1_scores.loc[(compiled_f1_scores['shuffled'] == False) & (compiled_f1_scores['data_split'] == "test")]
+# sort the F1 score data highest to lowest
+final_model_test_f1_scores = final_model_test_f1_scores.sort_values(["F1_Score"], ascending=False)
+# only use top 5 performing phenotypes
+final_model_test_f1_scores = final_model_test_f1_scores.head(5)
+# preview phenotypic classes and F1 scores
+final_model_test_f1_scores
+
+
+# ### Get 15 correct sample images (3 from each of the 5 top performing classes) and save their paths
+
+# In[4]:
+
+
+single_cell_images_dir_path = pathlib.Path("mitocheck_single_cell_sample_images/")
+phenotypic_classes = final_model_test_f1_scores["Phenotypic_Class"].tolist()
+correct_15_images = get_15_correct_sample_images(
+    phenotypic_classes, test_data, log_reg_model, single_cell_images_dir_path
+)
+
+# save paths of 15 images in tidy format
+# use melt to convert pandas dataframe to tidy long format and drop/rename to achieve desired format
+tidy_15_images = pd.melt(correct_15_images, ["Phenotypic_Class"], ignore_index=True).sort_values(["Phenotypic_Class"]).reset_index(drop=True)
+tidy_15_images = tidy_15_images.drop(["variable"], axis=1)
+tidy_15_images = tidy_15_images.rename(columns={"value": "Correctly_Labled_Image_Path"})
+
+tidy_15_images_save_path = pathlib.Path("sample_image_paths/top_5_performing_classes.tsv")
+tidy_15_images_save_path.parents[0].mkdir(parents=True, exist_ok=True)
+tidy_15_images.to_csv(tidy_15_images_save_path, sep="\t")
+
+# show the 15 image paths that are being used
+correct_15_images
+
+
+# ### Show 3 examples of correct predictions for each of top 5 performing phenotypic classes
+
+# In[5]:
+
+
+plot_15_correct_sample_images(correct_15_images)
+
diff --git a/6.single_cell_images/single_cell_images.sh b/6.single_cell_images/single_cell_images.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# Convert notebook to python file and execute
+jupyter nbconvert --to python \
+        --output-dir=scripts/nbconverted \
+        --execute correct_15_images.ipynb