-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response to Reviewers] Update dimensionality reduction figures #67
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I added a few comments for you to address.
I think before we merge, should we not pick a perplexity we want to include in the manuscript? Maybe it is okay to have them all, but from my perspective, they all look pretty similar (FYI some plots could not render in GitHub).
I think these results are similar to the UMAP where CellProfiler features look to have more heterogeneity compared to DP and CP_CP. This will be a really nice supplemental figure!
# In[2]: | ||
|
||
|
||
np.random.seed(1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not random seed 0
? I have found that through previous PR reviews that Way Lab convention is to use 0
as random seed specificaly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not have a convention on random seeds - for whatever reason, i always use 1234
# In[3]: | ||
|
||
|
||
output_file = pathlib.Path("evaluations", "tsne_embeddings.csv.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to use compressed CSV over a parquet file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
habit! IMO it's ok to keep compressed csv
# | ||
# > The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples. | ||
# | ||
# We do not know what the appropriate value of perplexity is for our dataset, so we will test several |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# We do not know what the appropriate value of perplexity is for our dataset, so we will test several | |
# We do not know what the appropriate value of perplexity is for our dataset, so we will test several. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will address! Thanks for the careful eye 👁️
|
||
|
||
tsne_embedding_df = [] | ||
list_of_perplexities = [2, 10, 15, 30, 40, 60, 80, 100, 150, 300] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose these integers specifically? I could see using list comprehension to make a much larger list. I recommend including a code comment explaining why you have a list of these integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment!
I chose to span a range of values. I tested a bunch by eye - I wanted to see the clustering go from not working to working to not working
|
||
|
||
# Output file | ||
tsne_embedding_df.to_csv(output_file, sep="\t", index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your output file extension above has a csv.gz
extension but in this code you have sep="\t"
which would mean you are saving a tsv.gz
. Recommend updating the extension in output_file
to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great catch! I will fix
focus_corr_colors = c( | ||
"TRUE" = "blue", | ||
"FALSE" = "orange" | ||
) | ||
focus_corr_labels = c( | ||
"TRUE" = "Yes", | ||
"FALSE" = "No" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this since these figures don't have correlation plots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! Good catch
I agree - we will not include all. You will find in the response to reviewers document that I have selected 40. Thanks for the review! I am merging now |
This PR accomplishes two, interrelated items, that I perform in response to two separate reviewer comments.
1) Move supplementary UMAP to main figure 2
Reviewer comment
In response, I moved S2A to Fig 2A, and I updated Figure S2
Updated Figure 2
Updated Supplementary UMAP figure
2) Test t-SNE
Reviewer comment
In response, I fit and trained several t-SNE models (with different perplexities). Here is one example with
perplexity=40
The results are interesting, and we will decide how to proceed in a future conversation