-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Does DeepProfiler allow single-cell dataset generation? #326
Comments
Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the The example is the following:
Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: |
Thanks @Arkkienkeli for your prompt response, it is really appreciated. As you said, I ran the script but encountered with key error ( ➡️ CLI command: ➡️ ➡️ LOGS: (deepprofiler_env) vasudev.sharma@bh-login001:~/projects/DeepProfiler$ python deepprofiler --root=/mnt/bh1/scratch/vasudev.sharma/data/bbbc021 --config=config-resnet.json --metadata=index.csv --single-cells=sample --gpu 0 export-sc
2022-06-21 11:23:24.196506: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2022-06-21 11:23:28,147 - WARNING - From /mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.7.0 and strictly below 2.10.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.5.3 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
warnings.warn(
Reading metadata form /mnt/bh1/scratch/vasudev.sharma/data/single-cell-bbbc021/inputs/metadata/index.csv
/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/metadata.py:38: FutureWarning: In a future version of pandas all arguments of read_csv except for the argument 'filepath_or_buffer' will be keyword-only.
self.loadSingle(filename, delimiter, dtype)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3843 entries, 0 to 3842
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 3843 non-null int64
1 Metadata_Plate 3843 non-null object
2 Metadata_Well 3843 non-null object
3 Metadata_Site 3843 non-null object
4 Plate_Map_Name 3843 non-null object
5 DNA 3843 non-null object
6 Tubulin 3843 non-null object
7 Actin 3843 non-null object
8 Replicate 3843 non-null int64
9 Compound_Concentration 3843 non-null object
10 compound 3843 non-null object
11 concentration 3843 non-null float64
12 moa 3843 non-null object
13 replicate_use 3843 non-null object
dtypes: float64(1), int64(2), object(11)
memory usage: 420.5+ KB
None
Traceback (most recent call last):
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/__main__.py", line 216, in <module>
cli(obj={})
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/__main__.py", line 156, in export_sc
dset = deepprofiler.dataset.image_dataset.read_dataset(context.obj["config"])
File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 219, in read_dataset
metadata.splitMetadata(trainingFilter, validationFilter)
File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/metadata.py", line 68, in splitMetadata
self.train = self.data[trainingRule(self.data)].copy()
File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda>
trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"])
KeyError: 'training'
{
"dataset": {
"metadata": {
"label_field": "Compound_Concentration",
"control_value": "DMSO_0.0"
},
"images": {
"channels": [
"DNA",
"Tubulin",
"Actin"
],
"file_format": "tif",
"bits": 16,
"width": 1280,
"height": 1024
}
},
"prepare": {
"illumination_correction": {
"down_scale_factor": 4,
"median_filter_size": 24
},
"compression": {
"implement": false,
"scaling_factor": 1.0
}
},
"train": {
"partition": {
"targets": [
"Compound_Concentration"
],
"split_field": "replicate_use",
"training_values": ["Training"],
"validation_values": ["Validation", "None"]
},
"model": {
"name": "resnet50",
"crop_generator": "crop_generator",
"metrics": ["accuracy", "top_k"],
"epochs": 200,
"params": {
"learning_rate": 0.01,
"batch_size": 32,
"conv_blocks": 50,
"feature_dim": 256,
"pooling": "None"
},
"lr_schedule": "cosine",
"backup_schedule":{
"epoch":[40,80],
"lr":[0.0005, 0.0001]
}
},
"sampling": {
"factor": 1.0,
"box_size": 96,
"mask_objects": false,
"queue_size": 10000,
"workers": 4
},
"validation": {
"top_k": 5,
"batch_size": 32,
"frame": "val",
"sample_first_crops": true
},
"comet_ml": {
"track": true,
"api_key": "guPdvYt7wQSIQHBE7szQxTgrd",
"project_name": "bbbc021"
}
},
"profile": {
"pretrained": false,
"feature_layer": "res4b1_relu",
"checkpoint": "checkpoint_0100.hdf5",
"batch_size": 8
}
}
The json file doesn't have key "training" as pointed out by the error log. Any clues/ideas what I might be doing wrong? Thanks @Arkkienkeli for the help. |
@vasudev-sharma See the section The bug you encountered |
Hey @Arkkienkeli I came across this issue while searching for how to determine what |
Thanks for the help @infominer!! |
Hi @infominer, it depends on the size of the cells in your dataset. We usually recommend the size of 128 for BBBC021 dataset, but for custom dataset it might differ. I would recommend to choose |
I am able to generate single-cell images with bounding box size 96. To be sure I am doing it correctly, would it be possible to have a second look at my
I am just interested in cropping the regions, so I believe I have to set "mask_objects": false and I guess not add "outlines" in my config, correct? Here is my config file 👇 {
"dataset": {
"metadata": {
"label_field": "Compound_Concentration",
"control_value": "DMSO_0.0"
},
"images": {
"channels": [
"DNA",
"Tubulin",
"Actin"
],
"file_format": "tif",
"bits": 16,
"width": 1280,
"height": 1024
},
"locations": {
"mode": "single_cells",
"box_size": 96,
"mask_objects": false
}
},
"prepare": {
"illumination_correction": {
"down_scale_factor": 4,
"median_filter_size": 24
},
"compression": {
"implement": false,
"scaling_factor": 1.0
}
},
"train": {
"partition": {
"targets": [
"Compound_Concentration"
],
"split_field": "replicate_use",
"training": ["Training"],
"validation": ["Validation", "None"]
},
"model": {
"name": "resnet50",
"crop_generator": "crop_generator",
"metrics": ["accuracy", "top_k"],
"epochs": 200,
"params": {
"learning_rate": 0.01,
"batch_size": 32,
"conv_blocks": 50,
"feature_dim": 256,
"pooling": "None"
},
"lr_schedule": "cosine",
"backup_schedule":{
"epoch":[40,80],
"lr":[0.0005, 0.0001]
}
},
"sampling": {
"factor": 1.0,
"box_size": 96,
"mask_objects": false,
"queue_size": 10000,
"workers": 4,
"cache_size": 64
},
"validation": {
"top_k": 5,
"batch_size": 32,
"frame": "val",
"sample_first_crops": true
},
"comet_ml": {
"track": true,
"api_key": "guPdvYt7wQSIQHBE7szQxTgrd",
"project_name": "bbbc021"
}
},
"profile": {
"pretrained": false,
"feature_layer": "res4b1_relu",
"checkpoint": "checkpoint_0100.hdf5",
"batch_size": 8
}
} |
Thanks! I am new to this field, and using images from a cell painting assay. We did a check on cells in which the conditions of the assay were slightly different and used Columbus. The Star Radial Mean (Nuclei Selected) measure was higher, and we have XY locations from Columbus as well. So going by your suggestion I should use a larger box size. What's the best metric to see if I am on the right path? |
@infominer For some datasets that we use, the locations are derived with CellProfiler. As an alternative I also used Cellpose segmentation method and then used the centers of the obtained segmentation masks. Please note that XY coordinates in |
Is it normal to have the last channel of the single-cell images look like this (almost black pixels)? For instance, the generated single-cell image directory ( This behaviour is common for all the generated single-cell images and not particular for this specific site / Well / Experiment. Also, I am noticing some weird artifacts in some of the generated images I guess my config file share here might lead to these single-cell images. Any suggestions how I can rectify it? |
@Arkkienkeli Thanks for those pointers! Looks like columbus will also give me those measures and it's average bounding size is around 56 for one of the datasets, so I think I will start with 96 and evaluate the results. |
@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting! |
Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset? |
The easiest workaround could be to use blank images as masks, it is ok if you are not going to use them (it will result in slightly increased size of images). I am going to fix it asap. |
Thanks a lot for suggesting an alternate quick-fix. For now, I will wait for a fix of this issue. |
It was fixed by #327. |
I am new to DeepProfiler and was interested in single-cell image dataset generation. In particular, for single-cell BBBC021 dataset generation, I was wondering given that I have BBBC021 dataset as mentioned in the project structure of Wiki, is there a script/workflow in DeepProfiler which generates single-cell images?
I went through the wiki and the readme, but was unable to find any useful information in that regards.
Any help would be really appreciated.
The text was updated successfully, but these errors were encountered: