Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on page /machine_learning.html #107

Closed
jasperhyp opened this issue Dec 20, 2024 · 5 comments
Closed

Issue on page /machine_learning.html #107

jasperhyp opened this issue Dec 20, 2024 · 5 comments

Comments

@jasperhyp
Copy link

The path cpg0016-jump/SOURCE/workspace/segmentation/cellpose/objects/BATCH/ for cpg0016-jump (specified on page) does not exist in the AWS S3 bucket.

@ErinWeisbart
Copy link
Member

They have been uploaded for source_8 and the uploads for other sources are in progress.
(Granular details are available in #73 though I will note in this issue as well when all of the uploads have completed)
Thanks for your patience.

@jasperhyp
Copy link
Author

jasperhyp commented Dec 20, 2024

Thank you for the clarification, Erin! I went into source_8 and was able to identify single cell images in e.g. https://open.quiltdata.com/b/cellpainting-gallery/tree/cpg0016-jump/source_8/workspace/segmentation/cellpose_202404/objects/J1/A1170384/A1170384.zarr/source_8__J1__A1170384__A01__3/single_cell_data/. However, it seems the file is in a different format compared with those in e.g. cpg0019-moshkov-deepprofiler/broad/training_images/BBBC036/, which is typically a 960x160 TIFF image displaying 6 channels and then a segmentation mask. Could you please point me to potential documentations as well? Thanks in advance!

@ErinWeisbart
Copy link
Member

The cpg0019 files are described in Moshkov et al 2024 https://www.nature.com/articles/s41467-024-45999-1

The cpg0016 files have not yet been described in a publication but the workflow being used to create the image crops is here https://github.com/theislab/jump-cpg0016-segmentation

@jasperhyp
Copy link
Author

jasperhyp commented Jan 1, 2025

Thank you, Erin, for the clarifications! As a follow-up, I've posted another question about these processed datasets in another repo specific to cpg0019 (Moshkov et al., 2024), but wanted to also cross-post it here just in case you might know the answer as well:

I was wondering if you could kindly clarify if the processed dataset only contains the training split (but not the validation split, so that it is not the full processed e.g. cpg0012 dataset), since the parent folder is named training_images (e.g. cpg0019-moshkov-deepprofiler/broad/training_images/BBBC036/). Also, it seems that there are much fewer folders in the processed BBBC036 compared with the original cpg0012 images as in here. For example, 24277 is not in the processed version of BBBC036/CDRP, and even in 24278, there are many subfolders missing in the processed dataset compared with the raw dataset. Could you please clarify? I would also appreciate it if you could suggest possible ways to acquire the full processed datasets. If that's not readily available, could you please kindly point me towards the script/notebook that would generate the processed images from raw ones?

Happy New Year!

Edit 1/5: Oops I realized the training data is composed of subsets of those datasets as stated in the paper:

We selected 348 treatments from BBBC022 (strongest 35%), 354 from BBBC036 (strongest 23%) and 47 treatments from BBBC037 (strongest 23%). We complemented these treatments with the corresponding replicates in the LINCS and BBBC043 datasets, and added 7 new compounds and 32 new gene overexpression perturbations, resulting in 488 treatments in total (Fig. 4).

Still, I would appreciate suggestions on acquiring the full processed datasets! (Especially if that's readily available somewhere perhaps.)

@ErinWeisbart
Copy link
Member

Nikita responded to the cpg0019 follow-up question in the appropriate location so I am closing this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants