Adding CP features to ggplot visualization #24

gwaybio · 2023-03-30T23:06:02Z

Also adding CP+DP features and the F1 score notebook

review-notebook-app · 2023-03-30T23:06:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

gwaybio · 2023-03-30T23:09:17Z

Results summary

Feature space	Number of top scoring phenotypes
CP_and_DPtest	10
CPtest	4
DPtest	2

F1 Score results

PR Curve results

gwaybio · 2023-03-30T23:11:24Z

@roshankern - I also had a question: what is Weighted? Should I include this class in the F1 score bar chart?

roshankern · 2023-04-03T14:56:42Z

@roshankern - I also had a question: what is Weighted? Should I include this class in the F1 score bar chart?

Weighted refers to the weighted F1 score (the mean F1 score that takes into account occurrences of each label). I think this is good to include as it gives an idea of the overall performance of the model.

roshankern

I made a couple of small comments but overall LGTM!

roshankern · 2023-04-03T15:04:50Z

7.figures/figure_themes.r

+feature_type_with_data_split_colors <- c(
+    "CP_and_DPtest" = "#C214CB",
+    "CP_and_DPtrain" = "#E88EED",
+    "CPtest" = "#EB4B4B",
+    "CPtrain" = "#F8B5B5",
+    "DPtest" = "#5158bb",
+    "DPtrain" = "#B5B9EA"
+)


What made you choose these colors? I'm wondering if the contrast could be increased at all with a different color scheme.

I picked purple (CP+DP) to indicate a combination of red (CP) and blue (DP), but I agree that i could choose colors with better contrast. I'll investigate

roshankern · 2023-04-03T15:08:23Z

7.figures/nbconverted/f1_score_visualization.r

+    dplyr::filter(
+        data_split == "test",
+        shuffled == "False"
+    )


Consider adding visualizations for F1 scores on training data and shuffled baseline on test data. I think the f1 scores on training data are particularly interesting to see how much different types of models overfit.

My aim for the F1 score figure is to have a simple plot that includes a single number per phenotype, in fact, I might make it simpler!

The PR curves do the job of showing training vs. testing vs. shuffled baseline very well already :)

roshankern · 2023-04-04T21:57:58Z

Another small comment: consider changing the final pr curve figure to 3 rows x 5 comments instead of 4x4 format. I think this would look better for a final figure, especially since there are 15 models. This is nitpicky, so feel free to ignore.

roshankern · 2023-04-06T20:21:49Z

There might be something to address in this PR now that the threshold values changed for multiclass models in #26.

…eature_viz

gwaybio · 2023-05-16T20:02:07Z

Another small comment: consider changing the final pr curve figure to 3 rows x 5 comments instead of 4x4 format. I think this would look better for a final figure, especially since there are 15 models. This is nitpicky, so feel free to ignore.

This is a very good suggestion.

I've also implemented all other suggestions, I am going to merge!

gwaybio · 2023-05-16T20:06:59Z

Weighted refers to the weighted F1 score (the mean F1 score that takes into account occurrences of each label). I think this is good to include as it gives an idea of the overall performance of the model.

One last thing - i changed this label to increase clarity

* Refactor Download Module (#18) * refactor module * remove training data file * Update 0.download_data/scripts/nbconverted/download_data.py Co-authored-by: Erik Serrano <[email protected]> * eric suggestions --------- Co-authored-by: Erik Serrano <[email protected]> * Refactor Split Data Module (#19) * refactor module * greg suggestions * Train module refactor (#20) * refactor format module * use straify function * rerun train module * black formatting * docs, nbconvert * nbconvert * rerun pipeline, rename model * fix typo * Update 2.train_model/README.md Co-authored-by: Gregory Way <[email protected]> * Update 2.train_model/README.md Co-authored-by: Gregory Way <[email protected]> * Update 2.train_model/README.md Co-authored-by: Gregory Way <[email protected]> * notebook run --------- Co-authored-by: Gregory Way <[email protected]> * Refactor evaluate module (#21) * refactor clas pr curves * refactor confusion matrix * refactor F1 scores * refactor model predictions * documentation * dave suggestions * erik suggestions, reconvert * Refactor interpret module (#22) * refactor interpret notebook * docs, reconvert script * greg suggestions * Get Leave One Image Out Probabilities (#23) * add LOIO notebook * LOIO notebook * update notebook * download and split data with cell UUIDs * move LOIO * finish LOIO * black formatting * rerun notebook * rerun notebook, dave suggestions * greg comment * Train single class models (#25) * move multiclass models * rename files, fix sh * single class models notebook * run notebook * binarize labels * train single class models * reconvert notebooks * update readme * rename sh file * remove models * eric readme suggestions * rerun notebook, eric suggestions * Add Single Class Model PR Curves (#26) * get SCM PR curves * shuffled baseline * retrain single class models with correct kernel * rerun pr curves notebook * remove nones * rerun multiclass model * rerun notebook * move file * docs, black formatting * format notebook * Update 3.evaluate_model/README.md Co-authored-by: Dave Bunten <[email protected]> * dave suggestions * reconvert notebook --------- Co-authored-by: Dave Bunten <[email protected]> * Add SCM confusion matrices and F1 scores (#27) * get SCM PR curves * shuffled baseline * retrain single class models with correct kernel * rerun pr curves notebook * remove nones * rerun multiclass model * rerun notebook * move file * create SCM confusion matrix * rerun notebook * add changes from last PR * rerun notebook * add SCM F1, update SCM confusion matrices * documentation * rerun notebook * Update utils/evaluate_utils.py Co-authored-by: Dave Bunten <[email protected]> * Update utils/evaluate_utils.py Co-authored-by: Dave Bunten <[email protected]> * Update 3.evaluate_model/scripts/nbconverted/F1_scores.py Co-authored-by: Dave Bunten <[email protected]> * dave suggestions --------- Co-authored-by: Dave Bunten <[email protected]> * Get SCM Predictions and LOIO Probabilities (#29) * get SCM LOIO probas * reconvert notebook * get model predictions * rerun LOIO * reconvert notebook * save and reconvert notebook * eric suggestions * Add SCM Interpretations (#30) * add scm coefficients * rerun interpret multi-class model * compare model coefficients * nbconvert * readme * make all correlations negative * rerun training * rerun evaluate * rerun interpret * docs * newline * rerun LOIO * Remove unused cp features (#31) * rerun download/split modules * rerun multicalss models * rerun single class model * rerun evaluate module * get LOIO probas * rerun interpret module * rerun download data * Adding CP features to ggplot visualization (#24) * set colors for model types * visualize precision recall with CP and DP+CP * add F1 score barchart visualization * minor tweak of f1 score print * ignore mac files * merge main and rerun viz * change color scheme for increased contrast * add f1 score of the top model, and rerun with updated colors * nrow = 3 in facet * change name of weighted f1 score * update single cell images module (#32) * Refactor validate module (#33) * update validate module * refactor validation * get correlations * convert notebook * update readme * formatting, documentation * reset index * vadd view notebook * docs, black formatting * ccc credit * show all correlations * add notebook * remove preview notebook * convert notebook * add differences heatmaps * preview correlation differences * add docs * black formatting --------- Co-authored-by: Erik Serrano <[email protected]> Co-authored-by: Gregory Way <[email protected]> Co-authored-by: Dave Bunten <[email protected]>

gwaybio added 3 commits March 30, 2023 17:03

set colors for model types

0c055c5

visualize precision recall with CP and DP+CP

0f21564

add F1 score barchart visualization

893eeb0

gwaybio requested a review from roshankern March 30, 2023 23:06

gwaybio added 2 commits March 30, 2023 17:10

minor tweak of f1 score print

23e8db1

ignore mac files

f2b9445

roshankern approved these changes Apr 3, 2023

View reviewed changes

roshankern mentioned this pull request Apr 6, 2023

Add Single Class Model PR Curves #26

Merged

gwaybio added 5 commits May 16, 2023 13:16

Merge remote-tracking branch 'upstream/cp-feature-refactor' into cp_f…

41d8b29

…eature_viz

merge main and rerun viz

d6a77bb

change color scheme for increased contrast

ca1e0ff

add f1 score of the top model, and rerun with updated colors

f1e6a30

nrow = 3 in facet

43e2c9c

change name of weighted f1 score

a87b38c

gwaybio merged commit 1ab736b into WayScience:cp-feature-refactor May 16, 2023

gwaybio deleted the cp_feature_viz branch May 16, 2023 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CP features to ggplot visualization #24

Adding CP features to ggplot visualization #24

gwaybio commented Mar 30, 2023

review-notebook-app bot commented Mar 30, 2023

gwaybio commented Mar 30, 2023

gwaybio commented Mar 30, 2023

roshankern commented Apr 3, 2023

roshankern left a comment

roshankern Apr 3, 2023

gwaybio May 16, 2023

roshankern Apr 3, 2023

gwaybio May 16, 2023

roshankern commented Apr 4, 2023

roshankern commented Apr 6, 2023

gwaybio commented May 16, 2023

gwaybio commented May 16, 2023

Adding CP features to ggplot visualization #24

Adding CP features to ggplot visualization #24

Conversation

gwaybio commented Mar 30, 2023

review-notebook-app bot commented Mar 30, 2023

gwaybio commented Mar 30, 2023

Results summary

F1 Score results

PR Curve results

gwaybio commented Mar 30, 2023

roshankern commented Apr 3, 2023

roshankern left a comment

Choose a reason for hiding this comment

roshankern Apr 3, 2023

Choose a reason for hiding this comment

gwaybio May 16, 2023

Choose a reason for hiding this comment

roshankern Apr 3, 2023

Choose a reason for hiding this comment

gwaybio May 16, 2023

Choose a reason for hiding this comment

roshankern commented Apr 4, 2023

roshankern commented Apr 6, 2023

gwaybio commented May 16, 2023

gwaybio commented May 16, 2023