[Response to Reviewers] Add Silhouette analysis #68

gwaybio · 2024-09-15T13:39:33Z

This PR is in response to the following reviewer comment:

You state “By eye, CellProfiler features demonstrated the most heterogeneity…”, is there perhaps a way of quantifying this? For example, some kind of neighbourhood analysis.

We think this is a good idea, and therefore performed the following analysis:

Per feature space, calculate Silhouette score per phenotype in an all vs. one comparison (e.g., All anaphase cells vs. all other cells)
Apply PCA to make dimensions of input features consistent (n_components=50)
Calculate average Silhouette width (per all-vs.-one phenotype and per feature space)

We interpret the Silhouette scores how well cells of a given phenotype are clustered compared to other cells of the same phenotype. A positive score means cells of the same phenotype are more similar to other cells of the same phenotype (on average) compared to all other cells. A score of 1 indicates complete separation of similar phenotypes from other phenotypes.

New supplementary figure

review-notebook-app · 2024-09-15T13:39:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jenna-tomkinson

LGTM! Really nice, clean, and simple PR!

Lookis like this analysis does show that CellProfiler does have the most heterogeneity since it has 6 phenotypes with the top positive silhouette score. Very interesting results!

jenna-tomkinson · 2024-09-16T14:50:05Z

3.evaluate_model/scripts/nbconverted/calculate_silhouette_metrics.py

+# In[2]:
+
+
+np.random.seed(1234)


Recommend random seed 0 to be consistent with Way Lab standard in other projects.

i will stick with 1234

jenna-tomkinson · 2024-09-16T14:50:48Z

3.evaluate_model/scripts/nbconverted/calculate_silhouette_metrics.py

+# For consistent Silhouette input space dimensionality
+n_pca_components = 50


Where does 50 come from? I know in UMAP we do 2 components, what is the difference when is comes to PCA?

Also heads up, in the PR comment you say the number of components is 40 but in here it is 50, recommend confirming which one is correct/most appropriate.

ah! Great catch.

In my experience, 50 is more than enough to capture the majority of the variance in the dataset, which is what we're aiming for. It's more or less an arbitrary number

jenna-tomkinson · 2024-09-16T14:53:04Z

3.evaluate_model/scripts/nbconverted/calculate_silhouette_metrics.py

+output_silhouette_results = pathlib.Path(
+    eval_path, "silhouette_score_results.tsv"
+)
+output_silhouette_samples_results = pathlib.Path(
+    eval_path, "silhouette_score_results_per_sample.tsv"
+)


Recommend making these compressed TSVs for saving space plus that might be the standard convention in this repo.

they are super tiny, will stick with tsv so they can be rendered on github

gwaybio · 2024-09-16T18:11:00Z

Thanks for the review @jenna-tomkinson - I caught a couple things too, which I addressed in the recent commits. Merging now!

gwaybio added 2 commits September 15, 2024 07:32

add notebook and results for silhouette scores

9788264

add silhouette visualization

c2efecf

jenna-tomkinson approved these changes Sep 16, 2024

View reviewed changes

gwaybio added 2 commits September 16, 2024 12:09

calculate total silhouette sum per feature space

13419d1

remove unused file name reference

acceeac

gwaybio merged commit 92e4025 into WayScience:main Sep 16, 2024

gwaybio deleted the silhouette branch September 16, 2024 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Response to Reviewers] Add Silhouette analysis #68

[Response to Reviewers] Add Silhouette analysis #68

gwaybio commented Sep 15, 2024 •

edited

Loading

review-notebook-app bot commented Sep 15, 2024

jenna-tomkinson left a comment

jenna-tomkinson Sep 16, 2024

gwaybio Sep 16, 2024

jenna-tomkinson Sep 16, 2024

jenna-tomkinson Sep 16, 2024

gwaybio Sep 16, 2024

jenna-tomkinson Sep 16, 2024

gwaybio Sep 16, 2024

gwaybio commented Sep 16, 2024

		# For consistent Silhouette input space dimensionality
		n_pca_components = 50

		# In[2]:


		np.random.seed(1234)

[Response to Reviewers] Add Silhouette analysis #68

[Response to Reviewers] Add Silhouette analysis #68

Conversation

gwaybio commented Sep 15, 2024 • edited Loading

New supplementary figure

review-notebook-app bot commented Sep 15, 2024

jenna-tomkinson left a comment

Choose a reason for hiding this comment

jenna-tomkinson Sep 16, 2024

Choose a reason for hiding this comment

gwaybio Sep 16, 2024

Choose a reason for hiding this comment

jenna-tomkinson Sep 16, 2024

Choose a reason for hiding this comment

jenna-tomkinson Sep 16, 2024

Choose a reason for hiding this comment

gwaybio Sep 16, 2024

Choose a reason for hiding this comment

jenna-tomkinson Sep 16, 2024

Choose a reason for hiding this comment

gwaybio Sep 16, 2024

Choose a reason for hiding this comment

gwaybio commented Sep 16, 2024

gwaybio commented Sep 15, 2024 •

edited

Loading