Adding several publication ready figures [Figure 2, Supplementary Figs 2 and 3] #38

gwaybio · 2023-09-21T20:54:54Z

Sorry for the long PR - it generates all figures (main and supplementary) for the first results subsection. I also needed to calculate all pairwise correlations between cells in order to plot pairwise correlation density curves.

Main figure

Supplementary Figures

review-notebook-app · 2023-09-21T20:54:59Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jenna-tomkinson

LGTM! I left a bunch of questions for you to address prior to merging. This is mainly for my understanding and for the understanding of the audience viewing the code.

Let me know if I need to clarify any questions!

jenna-tomkinson · 2023-09-25T14:18:48Z

1.split_data/scripts/nbconverted/explore_data.py

+
+    # Output to file
+    output_file = f"{output_basename}_{feature_space}.tsv.gz"
+    cp_tidy_corr_df.to_csv(output_file, sep="\t", index=False)


I don't see any of these outputs. Do you recommend not putting these in a GitHub repo? I think I have some PRs where I include the intermediate CSV files like these (which make them look huge). I am wondering what the best practice would be here.

This is a great discussion starter. Thank you!

I tend to think of including data based on three variables:

Size

Importance

Reproducibility

For Size, there are some strict limits and thresholds to move from git to git-lfs to figshare/other. Another variable is Importance. Super important data need to be somewhere no matter the size. The last variable is reproducibility; is my analysis going to fail if I don't have this data.

There are also tradeoffs between these variables. For example, unimportant data don't belong anywhere, unless it is critical to reproducibility and it's small-ish.

I view this data as medium-ish size (~150MB) of relatively low importance that is not super critical to reproducibility because we have a notebook that can generate this data.

I probably should make a note in the figure generation notebook to make sure that a user run this notebook prior to generating the figure!

Ahh okay! I will need to be better about this practice then. When generating figures for Durbin lab I put the small CSV intermediate files in the PR. I agree with everything you stated here so I will make sure to be better about this and make sure that most important files are added to the repo.

jenna-tomkinson · 2023-09-25T14:20:24Z

7.figures/figures/main_figure_2_umap_and_correlation.png

What did you use to decide the phenotypes to display in these plots? Were these the phenotypes with the highest number of labels or was there a different metric you used?

As well for section B, what does that plot tell us if cell pairs are not the same phenotype? Was the pairwise correlation between single cells so we do expect that there is a large density of cells that would be both, for example, Apoptosis and then also cells that are not both the same?

If so, it is interesting how interphase is the only phenotype with all models that doesn't have that much separation between distributions.

What did you use to decide the phenotypes to display in these plots? Were these the phenotypes with the highest number of labels or was there a different metric you used?

Great question! I picked ones that were interesting to me :) We can always go back and refine this later easily. Do you have any phenotypes in mind you think we should focus on?

what does that plot tell us if cell pairs are not the same phenotype?

It tells us that on average, for most phenotypes, cells of different phenotypes have lower correlation.

Was the pairwise correlation between single cells so we do expect that there is a large density of cells that would be both, for example, Apoptosis and then also cells that are not both the same?

I'm not sure I understand this question. The curves are showing pairwise comparisons (so two cells; every cell compared to every other cell). A cell comparison can be either Apoptosis Cell vs. Different Apoptosis Cell or Apoptosis Cell vs. Not Apoptosis Cell.

If so, it is interesting how interphase is the only phenotype with all models that doesn't have that much separation between distributions.

Yes, this is interesting! What do you think it means?

jenna-tomkinson · 2023-09-25T14:26:14Z

7.figures/nbconverted/Figure2_UMAP_and_Correlation.r

+        Embedding_Value = "d"
+    )
+) %>%
+    dplyr::select(!...1) %>%#7570b3


I am a bit confused what is going on in lines 30-36. Are you able to explain what is going on? Also I see in line 30 you have a code comment that looks like a hex code, is this being used?

Are you able to explain what is going on?

I sure hope so! :)

I will add more specific comments to a new commit, but essentially I am wrangling the data in a certain way to prepare for plotting.

hex code, is this being used?

Oops! Great catch. Will remove

Fixed in d015e8b

jenna-tomkinson · 2023-09-25T14:27:53Z

7.figures/nbconverted/Figure2_UMAP_and_Correlation.r

+    dplyr::select(!...1) %>%#7570b3
+    tidyr::pivot_wider(names_from = UMAP_Embedding, values_from = Embedding_Value) %>%
+    dplyr::mutate(Mitocheck_Plot_Label = if_else(
+        Mitocheck_Phenotypic_Class %in% focus_phenotypes,


Where is "focus_phenotypes" defined in this code? I do not see it and it could be because I am missing it somewhere.

Ahh I see now, it comes from source("themes.r"). I would suggest to add a code comment to clarify what this file does/contains. It is probably standard practice for someone more familiar with R, but it might be good for the general audience (and myself haha).

It's standard practice, but it's not great standard practice! i will add a note near the library() load function.

Fixed in d015e8b

jenna-tomkinson · 2023-09-25T14:32:24Z

7.figures/themes.r

+library(ggplot2)
+library(dplyr)
+
+focus_phenotypes <- c(


Kinda goes along with my question for the main figure, but does it make sense to include in this file a short code comment on why these phenotypes are focused on?

Yes, this would be ideal. However, given our current manuscript state, I recommend that we skip adding a comment here for now, since our focus may shift slightly. We will definitely describe our rationale in the manuscript.

jenna-tomkinson · 2023-09-25T14:34:04Z

7.figures/themes.r

+    "Other" = "grey"
+)
+
+focus_phenotype_labels <- c(


From my perspective, this seems a bit unnecessary since you aren't changing the labels for any of phenotype names except for adding other. Is this the only way that you can add the other label? If so then this makes sense, just seems like extra lines of code.

This is annoying extra code, I agree, but it is required for customizing colors in plots the way that I am.

+ scale_color_manual( "Phenotype", values = focus_phenotype_colors, labels = focus_phenotype_labels )

gwaybio · 2023-09-26T20:59:25Z

Please feel free to continue discussions in each thread, but I will merge for now!

gwaybio added 3 commits September 21, 2023 14:50

add figure 2 umap and pairwise correlation

ce230a0

add notebook for calculating pairwise correlations

61403fe

add notebook to generate supplementary figures

8db9a32

gwaybio added 3 commits September 21, 2023 14:56

remove pycytominer import

66181ca

add ggplot themes.r

fa3ae1c

add new line

adfe7ca

gwaybio requested a review from jenna-tomkinson September 21, 2023 20:58

jenna-tomkinson approved these changes Sep 25, 2023

View reviewed changes

respond to Jenna PR comments

d015e8b

gwaybio merged commit 0918a7b into WayScience:main Sep 26, 2023

gwaybio deleted the add-fig2 branch September 26, 2023 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding several publication ready figures [Figure 2, Supplementary Figs 2 and 3] #38

Adding several publication ready figures [Figure 2, Supplementary Figs 2 and 3] #38

gwaybio commented Sep 21, 2023

review-notebook-app bot commented Sep 21, 2023

jenna-tomkinson left a comment

jenna-tomkinson Sep 25, 2023

gwaybio Sep 26, 2023

jenna-tomkinson Sep 27, 2023

jenna-tomkinson Sep 25, 2023

jenna-tomkinson Sep 25, 2023

gwaybio Sep 26, 2023

jenna-tomkinson Sep 25, 2023

gwaybio Sep 26, 2023

gwaybio Sep 26, 2023

jenna-tomkinson Sep 25, 2023

jenna-tomkinson Sep 25, 2023

gwaybio Sep 26, 2023

gwaybio Sep 26, 2023

jenna-tomkinson Sep 25, 2023

gwaybio Sep 26, 2023

jenna-tomkinson Sep 25, 2023

gwaybio Sep 26, 2023

gwaybio commented Sep 26, 2023

Adding several publication ready figures [Figure 2, Supplementary Figs 2 and 3] #38

Adding several publication ready figures [Figure 2, Supplementary Figs 2 and 3] #38

Conversation

gwaybio commented Sep 21, 2023

Main figure

Supplementary Figures

review-notebook-app bot commented Sep 21, 2023

jenna-tomkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gwaybio commented Sep 26, 2023