Skip to content

Commit

Permalink
Add single cell sample images module (#10)
Browse files Browse the repository at this point in the history
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <[email protected]>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <[email protected]>

* create single cell images module

* rename_module

* finish module

* remove sample images from PR

* Co-authored-by: Jenna Tomkinson <[email protected]>

* documentation

* documentation

* dave suggestions

* Update utils/single_cell_utils.py

Co-authored-by: Dave Bunten <[email protected]>

---------

Co-authored-by: Dave Bunten <[email protected]>
  • Loading branch information
roshankern and d33bs authored Feb 10, 2023
1 parent 051553e commit bbb9f88
Show file tree
Hide file tree
Showing 7 changed files with 768 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ __pycache__/
.vscode
# autosklearn
3.ML_model/autosklearn.ipynb
# single cell sample images
6.single_cell_images/mitocheck_single_cell_sample_images
39 changes: 39 additions & 0 deletions 6.single_cell_images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# 6. Single-Cell Images

In this module, we use the model on single-cell images to clearly demonstrate its application.

## Single-Cell Sample Image Dataset

The [single-cell sample image data](mitocheck_single_cell_sample_images) have kindly been provided by Dr. Thomas Walter of the MitoCheck consortium.
This dataset contains sample single-cell images in the following format:

```
mitocheck_single_cell_sample_images
└───phenotypic_class
│ │
│ └───sample_image_path.png
```

Because the features for these cells have already been extracted in [`mitocheck_data`](https://github.com/WayScience/mitocheck_data), we do not re-extract features from these images in this module.
Instead, features are associated with a single-cell image based on the cell's location metadata (plate, well, frame, x, y).

## Top 5 Performing Classes

In [correct_15_images.ipynb](correct_15_images.ipynb), we show 15 sample single-cell images that the final model from [2.train_model](../2.train_model/) correctly classifies.
Three single-cell images from each of the 5 top performing classes (as determined by F1 score from [compiled_F1_scores.tsv](../3.evaluate_model/evaluations/F1_scores/compiled_F1_scores.tsv)) are displayed and their paths are saved in [top_5_performing_classes.tsv](../6.single_cell_images/sample_image_paths/top_5_performing_classes.tsv).

## Step 1: Extract Sample Image Data

Use the commands below to run the Jupyter notebooks and extract the sample image data:

```sh
# Make sure you are located in 6.single_cell_images
cd 6.single_cell_images

# Activate phenotypic_profiling conda environment
conda activate phenotypic_profiling

# Interpret model
bash single_cell_images.sh
350 changes: 350 additions & 0 deletions 6.single_cell_images/correct_15_images.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Phenotypic_Class Correctly_Labled_Image_Path
0 Elongated mitocheck_single_cell_sample_images/Elongated/PLLT0048_13--ex2005_06_24--sp2005_05_23--tt19--c3___P00069_01___T00013___X0660___Y0691____img.png
1 Elongated mitocheck_single_cell_sample_images/Elongated/PLLT0138_03--ex2005_10_19--sp2005_10_04--tt17--c4___P00127_01___T00043___X1138___Y0920____img.png
2 Elongated mitocheck_single_cell_sample_images/Elongated/PLLT0030_18--ex2007_02_07--sp2005_05_02--tt17--c4___P00034_01___T00061___X0733___Y0675____img.png
3 Grape mitocheck_single_cell_sample_images/Grape/PLLT0157_04--ex2006_01_20--sp2005_10_27--tt17--c5___P00005_01___T00060___X0373___Y0912____img.png
4 Grape mitocheck_single_cell_sample_images/Grape/PLLT0066_19--ex2005_07_22--sp2005_06_07--tt173--c5___P00287_01___T00086___X0953___Y0186____img.png
5 Grape mitocheck_single_cell_sample_images/Grape/PLLT0066_19--ex2005_07_22--sp2005_06_07--tt173--c5___P00287_01___T00086___X1105___Y0313____img.png
6 Large mitocheck_single_cell_sample_images/Large/PLLT0043_48--ex2005_06_29--sp2005_05_19--tt163--c4___P00166_01___T00070___X1021___Y0160____img.png
7 Large mitocheck_single_cell_sample_images/Large/PLLT0043_48--ex2005_06_29--sp2005_05_19--tt163--c4___P00166_01___T00070___X1122___Y0143____img.png
8 Large mitocheck_single_cell_sample_images/Large/PLLT0013_38--ex2005_05_06--sp2005_04_11--tt163--c3___P00042_01___T00094___X0851___Y0101____img.png
9 OutOfFocus mitocheck_single_cell_sample_images/OutOfFocus/PLLT0027_45--ex2005_06_01--sp2005_04_27--tt17--c5___P00048_01___T00012___X0738___Y0874____img.png
10 OutOfFocus mitocheck_single_cell_sample_images/OutOfFocus/PLLT0027_45--ex2005_06_01--sp2005_04_27--tt17--c5___P00084_01___T00012___X0777___Y0702____img.png
11 OutOfFocus mitocheck_single_cell_sample_images/OutOfFocus/PLLT0029_05--ex2005_06_08--sp2005_01_01--tt17--c3___P00060_01___T00032___X0752___Y0607____img.png
12 Polylobed mitocheck_single_cell_sample_images/Polylobed/PLLT0027_44--ex2005_06_03--sp2005_04_27--tt17--c5___P00030_01___T00085___X0733___Y0690____img.png
13 Polylobed mitocheck_single_cell_sample_images/Polylobed/PLLT0046_19--ex2005_06_08--sp2005_01_01--tt17--c4___P00356_01___T00056___X0284___Y0282____img.png
14 Polylobed mitocheck_single_cell_sample_images/Polylobed/PLLT0084_46--ex2005_08_03--sp2005_07_07--tt17--c4___P00003_01___T00090___X1053___Y0534____img.png
88 changes: 88 additions & 0 deletions 6.single_cell_images/scripts/nbconverted/correct_15_images.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/usr/bin/env python
# coding: utf-8

# ### Import Libraries

# In[1]:


import pathlib
import pandas as pd
from joblib import load

import sys
sys.path.append("../utils")
from split_utils import get_features_data
from train_utils import get_dataset
from single_cell_utils import get_15_correct_sample_images, plot_15_correct_sample_images


# ### Load Model and Test Dataset

# In[2]:


# load final logistic regression model
model_dir = pathlib.Path("../2.train_model/models/")
log_reg_model_path = pathlib.Path(f"{model_dir}/log_reg_model.joblib")
log_reg_model = load(log_reg_model_path)

# load test data
data_split_path = pathlib.Path("../1.split_data/indexes/data_split_indexes.tsv")
data_split_indexes = pd.read_csv(data_split_path, sep="\t", index_col=0)
features_dataframe_path = pathlib.Path("../0.download_data/data/training_data.csv.gz")
features_dataframe = get_features_data(features_dataframe_path)
test_data = get_dataset(features_dataframe, data_split_indexes, "test")


# ### Get 5 best performing classes (final model, test dataset)

# In[3]:


# load compiled f1 scores
compiled_f1_scores_path = pathlib.Path("../3.evaluate_model/evaluations/F1_scores/compiled_F1_scores.tsv")
compiled_f1_scores = pd.read_csv(compiled_f1_scores_path, sep="\t", index_col=0)

# only get f1 score data for final model on test data
final_model_test_f1_scores = compiled_f1_scores.loc[(compiled_f1_scores['shuffled'] == False) & (compiled_f1_scores['data_split'] == "test")]
# sort the F1 score data highest to lowest
final_model_test_f1_scores = final_model_test_f1_scores.sort_values(["F1_Score"], ascending=False)
# only use top 5 performing phenotypes
final_model_test_f1_scores = final_model_test_f1_scores.head(5)
# preview phenotypic classes and F1 scores
final_model_test_f1_scores


# ### Get 15 correct sample images (3 from each of the 5 top performing classes) and save their paths

# In[4]:


single_cell_images_dir_path = pathlib.Path("mitocheck_single_cell_sample_images/")
phenotypic_classes = final_model_test_f1_scores["Phenotypic_Class"].tolist()
correct_15_images = get_15_correct_sample_images(
phenotypic_classes, test_data, log_reg_model, single_cell_images_dir_path
)

# save paths of 15 images in tidy format
# use melt to convert pandas dataframe to tidy long format and drop/rename to achieve desired format
tidy_15_images = pd.melt(correct_15_images, ["Phenotypic_Class"], ignore_index=True).sort_values(["Phenotypic_Class"]).reset_index(drop=True)
tidy_15_images = tidy_15_images.drop(["variable"], axis=1)
tidy_15_images = tidy_15_images.rename(columns={"value": "Correctly_Labled_Image_Path"})

tidy_15_images_save_path = pathlib.Path("sample_image_paths/top_5_performing_classes.tsv")
tidy_15_images_save_path.parents[0].mkdir(parents=True, exist_ok=True)
tidy_15_images.to_csv(tidy_15_images_save_path, sep="\t")

# show the 15 image paths that are being used
correct_15_images


# ### Show 3 examples of correct predictions for each of top 5 performing phenotypic classes

# In[5]:


plot_15_correct_sample_images(correct_15_images)

5 changes: 5 additions & 0 deletions 6.single_cell_images/single_cell_images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
# Convert notebook to python file and execute
jupyter nbconvert --to python \
--output-dir=scripts/nbconverted \
--execute correct_15_images.ipynb
Loading

0 comments on commit bbb9f88

Please sign in to comment.