Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response to Reviewers] Add Silhouette analysis #68

Merged
merged 4 commits into from
Sep 16, 2024

Conversation

gwaybio
Copy link
Member

@gwaybio gwaybio commented Sep 15, 2024

This PR is in response to the following reviewer comment:

You state “By eye, CellProfiler features demonstrated the most heterogeneity…”, is there perhaps a way of quantifying this? For example, some kind of neighbourhood analysis.

We think this is a good idea, and therefore performed the following analysis:

  1. Per feature space, calculate Silhouette score per phenotype in an all vs. one comparison (e.g., All anaphase cells vs. all other cells)
  2. Apply PCA to make dimensions of input features consistent (n_components=50)
  3. Calculate average Silhouette width (per all-vs.-one phenotype and per feature space)

We interpret the Silhouette scores how well cells of a given phenotype are clustered compared to other cells of the same phenotype. A positive score means cells of the same phenotype are more similar to other cells of the same phenotype (on average) compared to all other cells. A score of 1 indicates complete separation of similar phenotypes from other phenotypes.

New supplementary figure

supplementary_silhouette_scores

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Really nice, clean, and simple PR!

Lookis like this analysis does show that CellProfiler does have the most heterogeneity since it has 6 phenotypes with the top positive silhouette score. Very interesting results!

# In[2]:


np.random.seed(1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend random seed 0 to be consistent with Way Lab standard in other projects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will stick with 1234

Comment on lines +30 to +31
# For consistent Silhouette input space dimensionality
n_pca_components = 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 50 come from? I know in UMAP we do 2 components, what is the difference when is comes to PCA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also heads up, in the PR comment you say the number of components is 40 but in here it is 50, recommend confirming which one is correct/most appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! Great catch.

In my experience, 50 is more than enough to capture the majority of the variance in the dataset, which is what we're aiming for. It's more or less an arbitrary number

Comment on lines 39 to 44
output_silhouette_results = pathlib.Path(
eval_path, "silhouette_score_results.tsv"
)
output_silhouette_samples_results = pathlib.Path(
eval_path, "silhouette_score_results_per_sample.tsv"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend making these compressed TSVs for saving space plus that might be the standard convention in this repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are super tiny, will stick with tsv so they can be rendered on github

@gwaybio
Copy link
Member Author

gwaybio commented Sep 16, 2024

Thanks for the review @jenna-tomkinson - I caught a couple things too, which I addressed in the recent commits. Merging now!

@gwaybio gwaybio merged commit 92e4025 into WayScience:main Sep 16, 2024
@gwaybio gwaybio deleted the silhouette branch September 16, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants