Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response to Reviewers] Update dimensionality reduction figures #67

Merged
merged 8 commits into from
Sep 16, 2024

Conversation

gwaybio
Copy link
Member

@gwaybio gwaybio commented Sep 14, 2024

This PR accomplishes two, interrelated items, that I perform in response to two separate reviewer comments.

1) Move supplementary UMAP to main figure 2

Reviewer comment

Fig 2A – From this figure it looks like “elongated”, “large” and “interphase” (rather than “metaphase”) appear to be most distinct for CellProfiler features; however, looking at the individual maps in Fig S2, the split of “interphase” is easier to see. I feel it would be worth replacing Fig 2A with S2A as these show the distributions more clearly.

In response, I moved S2A to Fig 2A, and I updated Figure S2

Updated Figure 2

main_figure_2_umap_and_correlation

Updated Supplementary UMAP figure

supplementary_umap

2) Test t-SNE

Reviewer comment

The poor single-cell predictions feels inevitable based on the limited phenotype distinctions in Fig 2A. Were alternatives to UMAP, such as tSNE, tested to see if they yielded more clear distinctions?

In response, I fit and trained several t-SNE models (with different perplexities). Here is one example with perplexity=40

tsne_figure_perplexity_40

The results are interesting, and we will decide how to proceed in a future conversation

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I added a few comments for you to address.

I think before we merge, should we not pick a perplexity we want to include in the manuscript? Maybe it is okay to have them all, but from my perspective, they all look pretty similar (FYI some plots could not render in GitHub).

I think these results are similar to the UMAP where CellProfiler features look to have more heterogeneity compared to DP and CP_CP. This will be a really nice supplemental figure!

# In[2]:


np.random.seed(1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not random seed 0? I have found that through previous PR reviews that Way Lab convention is to use 0 as random seed specificaly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not have a convention on random seeds - for whatever reason, i always use 1234

# In[3]:


output_file = pathlib.Path("evaluations", "tsne_embeddings.csv.gz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to use compressed CSV over a parquet file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

habit! IMO it's ok to keep compressed csv

#
# > The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples.
#
# We do not know what the appropriate value of perplexity is for our dataset, so we will test several
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We do not know what the appropriate value of perplexity is for our dataset, so we will test several
# We do not know what the appropriate value of perplexity is for our dataset, so we will test several.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address! Thanks for the careful eye 👁️



tsne_embedding_df = []
list_of_perplexities = [2, 10, 15, 30, 40, 60, 80, 100, 150, 300]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose these integers specifically? I could see using list comprehension to make a much larger list. I recommend including a code comment explaining why you have a list of these integers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment!

I chose to span a range of values. I tested a bunch by eye - I wanted to see the clustering go from not working to working to not working



# Output file
tsne_embedding_df.to_csv(output_file, sep="\t", index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your output file extension above has a csv.gz extension but in this code you have sep="\t" which would mean you are saving a tsv.gz. Recommend updating the extension in output_file to reflect this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great catch! I will fix

Comment on lines 16 to 23
focus_corr_colors = c(
"TRUE" = "blue",
"FALSE" = "orange"
)
focus_corr_labels = c(
"TRUE" = "Yes",
"FALSE" = "No"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this since these figures don't have correlation plots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! Good catch

@gwaybio
Copy link
Member Author

gwaybio commented Sep 16, 2024

I think before we merge, should we not pick a perplexity we want to include in the manuscript?

I agree - we will not include all. You will find in the response to reviewers document that I have selected 40.

Thanks for the review! I am merging now

@gwaybio gwaybio merged commit 7aa47ba into WayScience:main Sep 16, 2024
@gwaybio gwaybio deleted the tsne branch September 16, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants