Question: Does DeepProfiler allow single-cell dataset generation? #326

vasudev-sharma · 2022-06-20T18:01:47Z

I am new to DeepProfiler and was interested in single-cell image dataset generation. In particular, for single-cell BBBC021 dataset generation, I was wondering given that I have BBBC021 dataset as mentioned in the project structure of Wiki, is there a script/workflow in DeepProfiler which generates single-cell images?

I went through the wiki and the readme, but was unable to find any useful information in that regards.

Any help would be really appreciated.

Arkkienkeli · 2022-06-21T10:34:17Z

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.

The example is the following:

python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config:
"mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

vasudev-sharma · 2022-06-21T17:38:09Z

Thanks @Arkkienkeli for your prompt response, it is really appreciated.

As you said, I ran the script but encountered with key error (KeyError: 'training' ) in the config file (config-resnet.json)). Here are the logs for your information.

➡️ CLI command: python deepprofiler --root=/mnt/bh1/scratch/vasudev.sharma/data/single-cell-bbbc021 --config=config-resnet.json --metadata=index.csv --single-cells=sample --gpu 0 export-sc

➡️ bbbc021 is the public dataset available at aws s3 ls --no-sign-request s3://cellpainting-gallery/cpg0010-caie-drugresponse/broad-az/workspace/deep_learning/

➡️ LOGS:

(deepprofiler_env) vasudev.sharma@bh-login001:~/projects/DeepProfiler$ python deepprofiler --root=/mnt/bh1/scratch/vasudev.sharma/data/bbbc021 --config=config-resnet.json --metadata=index.csv --single-cells=sample --gpu 0 export-sc
2022-06-21 11:23:24.196506: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2022-06-21 11:23:28,147 - WARNING - From /mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.7.0 and strictly below 2.10.0 (nightly versions are not supported). 
 The versions of TensorFlow you are currently using is 2.5.3 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  warnings.warn(
Reading metadata form /mnt/bh1/scratch/vasudev.sharma/data/single-cell-bbbc021/inputs/metadata/index.csv
/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/metadata.py:38: FutureWarning: In a future version of pandas all arguments of read_csv except for the argument 'filepath_or_buffer' will be keyword-only.
  self.loadSingle(filename, delimiter, dtype)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3843 entries, 0 to 3842
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      3843 non-null   int64  
 1   Metadata_Plate          3843 non-null   object 
 2   Metadata_Well           3843 non-null   object 
 3   Metadata_Site           3843 non-null   object 
 4   Plate_Map_Name          3843 non-null   object 
 5   DNA                     3843 non-null   object 
 6   Tubulin                 3843 non-null   object 
 7   Actin                   3843 non-null   object 
 8   Replicate               3843 non-null   int64  
 9   Compound_Concentration  3843 non-null   object 
 10  compound                3843 non-null   object 
 11  concentration           3843 non-null   float64
 12  moa                     3843 non-null   object 
 13  replicate_use           3843 non-null   object 
dtypes: float64(1), int64(2), object(11)
memory usage: 420.5+ KB
None
Traceback (most recent call last):
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/__main__.py", line 216, in <module>
    cli(obj={})
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/__main__.py", line 156, in export_sc
    dset = deepprofiler.dataset.image_dataset.read_dataset(context.obj["config"])
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 219, in read_dataset
    metadata.splitMetadata(trainingFilter, validationFilter)
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/metadata.py", line 68, in splitMetadata
    self.train = self.data[trainingRule(self.data)].copy()
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda>
    trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"])
KeyError: 'training'

config-resnet.json file I am using

{
    "dataset": {
        "metadata": {
            "label_field": "Compound_Concentration",
            "control_value": "DMSO_0.0"
        },
        "images": {
            "channels": [
                "DNA",
                "Tubulin",
                "Actin"
              ],
            "file_format": "tif",
            "bits": 16,
            "width": 1280,
            "height": 1024
        }
    },
    "prepare": {
        "illumination_correction": {
            "down_scale_factor": 4,
            "median_filter_size": 24
        },
        "compression": {
            "implement": false,
            "scaling_factor": 1.0
        }
    },
    "train": {           
        "partition": {
            "targets": [
                "Compound_Concentration"
            ],
            "split_field": "replicate_use",
            "training_values": ["Training"],
            "validation_values": ["Validation", "None"]
        },
        "model": {
            "name": "resnet50",
            "crop_generator": "crop_generator",
            "metrics": ["accuracy", "top_k"],
            "epochs": 200,
            "params": {
                "learning_rate": 0.01,
                "batch_size": 32,
                "conv_blocks": 50,
                "feature_dim": 256,
                "pooling": "None"
            },
            "lr_schedule": "cosine",
            "backup_schedule":{
                "epoch":[40,80],
                "lr":[0.0005, 0.0001]
            }
        },
        "sampling": {
            "factor": 1.0,
            "box_size": 96,
            "mask_objects": false,
            "queue_size": 10000,
            "workers": 4
        },
        "validation": {
            "top_k": 5,
            "batch_size": 32,
            "frame": "val",
            "sample_first_crops": true
        },
        "comet_ml": {
            "track": true,
            "api_key": "guPdvYt7wQSIQHBE7szQxTgrd",
            "project_name": "bbbc021"
        }

    },
    "profile": {
      "pretrained": false,
      "feature_layer": "res4b1_relu",
      "checkpoint": "checkpoint_0100.hdf5",
      "batch_size": 8
    }
}

The json file doesn't have key "training" as pointed out by the error log. Any clues/ideas what I might be doing wrong?

Thanks @Arkkienkeli for the help.

infominer · 2022-06-21T18:51:08Z

@vasudev-sharma See the section "train": { in the config file that has the following parameters
"train": { "partition": { "targets": [ "Compound_Concentration" ], "split_field": "replicate_use", "training_values": ["Training"], "validation_values": ["Validation", "None"]

The bug you encountered File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda> trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"])
means it's looking for key parameter training, but your config file has training_values. Change training_values to training and also maybe go ahead and change validation_values to validation. Check the code in image_dataset.py and see what it's lookinf for.

infominer · 2022-06-21T18:52:41Z

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.

The example is the following:
python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc
Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: "mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

Hey @Arkkienkeli I came across this issue while searching for how to determine what box_size to use in the config. Is there a reference fo this?

vasudev-sharma · 2022-06-21T19:10:14Z

@vasudev-sharma See the section "train": { in the config file that has the following parameters "train": { "partition": { "targets": [ "Compound_Concentration" ], "split_field": "replicate_use", "training_values": ["Training"], "validation_values": ["Validation", "None"]

The bug you encountered File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda> trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"]) means it's looking for key parameter training, but your config file has training_values. Change training_values to training and also maybe go ahead and change validation_values to validation. Check the code in image_dataset.py and see what it's lookinf for.

Thanks for the help @infominer!!

Arkkienkeli · 2022-06-21T19:19:17Z

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.
The example is the following:
python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc
Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: "mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).
Hey @Arkkienkeli I came across this issue while searching for how to determine what box_size to use in the config. Is there a reference fo this?

Hi @infominer, it depends on the size of the cells in your dataset. We usually recommend the size of 128 for BBBC021 dataset, but for custom dataset it might differ. I would recommend to choose box_size from 96, 128, 160 or 192. If the cells are small, probably you could resize them on your own after export (or before export, but correct the locations coordinates accordingly).

vasudev-sharma · 2022-06-21T19:25:53Z

I am able to generate single-cell images with bounding box size 96. To be sure I am doing it correctly, would it be possible to have a second look at my config.json file? Thanks for the help in advance!

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config:
"mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

I am just interested in cropping the regions, so I believe I have to set "mask_objects": false and I guess not add "outlines" in my config, correct?

Here is my config file 👇

{
    "dataset": {
        "metadata": {
            "label_field": "Compound_Concentration",
            "control_value": "DMSO_0.0"
        },
        "images": {
            "channels": [
                "DNA",
                "Tubulin",
                "Actin"
              ],
            "file_format": "tif",
            "bits": 16,
            "width": 1280,
            "height": 1024
        },
        "locations":   {
            "mode": "single_cells",
            "box_size": 96,
            "mask_objects": false
        }
    },
    "prepare": {
        "illumination_correction": {
            "down_scale_factor": 4,
            "median_filter_size": 24
        },
        "compression": {
            "implement": false,
            "scaling_factor": 1.0
        }
    },
    "train": {           
        "partition": {
            "targets": [
                "Compound_Concentration"
            ],
            "split_field": "replicate_use",
            "training": ["Training"],
            "validation": ["Validation", "None"]
        },
        "model": {
            "name": "resnet50",
            "crop_generator": "crop_generator",
            "metrics": ["accuracy", "top_k"],
            "epochs": 200,
            "params": {
                "learning_rate": 0.01,
                "batch_size": 32,
                "conv_blocks": 50,
                "feature_dim": 256,
                "pooling": "None"
            },
            "lr_schedule": "cosine",
            "backup_schedule":{
                "epoch":[40,80],
                "lr":[0.0005, 0.0001]
            }
        },
        "sampling": {
            "factor": 1.0,
            "box_size": 96,
            "mask_objects": false,
            "queue_size": 10000,
            "workers": 4,
            "cache_size": 64
        },
        "validation": {
            "top_k": 5,
            "batch_size": 32,
            "frame": "val",
            "sample_first_crops": true
        },
        "comet_ml": {
            "track": true,
            "api_key": "guPdvYt7wQSIQHBE7szQxTgrd",
            "project_name": "bbbc021"
        }

    },
    "profile": {
      "pretrained": false,
      "feature_layer": "res4b1_relu",
      "checkpoint": "checkpoint_0100.hdf5",
      "batch_size": 8
    }
}

infominer · 2022-06-21T19:26:59Z

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.
The example is the following:
python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc
Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: "mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).
Hey @Arkkienkeli I came across this issue while searching for how to determine what box_size to use in the config. Is there a reference fo this?
Hi @infominer, it depends on the size of the cells in your dataset. We usually recommend the size of 128 for BBBC021 dataset, but for custom dataset it might differ. I would recommend to choose box_size from 96, 128, 160 or 192. If the cells are small, probably you could resize them on your own after export (or before export, but correct the locations coordinates accordingly).

Thanks! I am new to this field, and using images from a cell painting assay. We did a check on cells in which the conditions of the assay were slightly different and used Columbus. The Star Radial Mean (Nuclei Selected) measure was higher, and we have XY locations from Columbus as well. So going by your suggestion I should use a larger box size. What's the best metric to see if I am on the right path?

Arkkienkeli · 2022-06-21T19:51:25Z

@infominer For some datasets that we use, the locations are derived with CellProfiler. As an alternative I also used Cellpose segmentation method and then used the centers of the obtained segmentation masks. Please note that XY coordinates in locations for DeepProfiler are centers of the crop.
To measure sizes of the cells, personally I would go with skimage pythin library, which has measure region properties methods. Then get a distribution of axis_major_length and decide on the box_size.
Maybe something similar can be done in Columbus?

vasudev-sharma · 2022-06-21T19:58:20Z

Is it normal to have the last channel of the single-cell images look like this (almost black pixels)?

For instance, the generated single-cell image directory (outputs/sample/Week1_22141/B02/s1/) here are some reference images for the described behaviour.

[email protected]

[email protected]

This behaviour is common for all the generated single-cell images and not particular for this specific site / Well / Experiment.

Also, I am noticing some weird artifacts in some of the generated images
outputs/sample/Week1_22141/B02/s1/[email protected]

I guess my config file share here might lead to these single-cell images. Any suggestions how I can rectify it?

infominer · 2022-06-21T20:27:24Z

@Arkkienkeli Thanks for those pointers! Looks like columbus will also give me those measures and it's average bounding size is around 56 for one of the datasets, so I think I will start with 96 and evaluate the results.

Arkkienkeli · 2022-06-21T22:14:07Z

Is it normal to have the last channel of the single-cell images look like this (almost black pixels)?

For instance, the generated single-cell image directory (outputs/sample/Week1_22141/B02/s1/) here are some reference images for the described behaviour.

[email protected]

[email protected]

This behaviour is common for all the generated single-cell images and not particular for this specific site / Well / Experiment.

Also, I am noticing some weird artifacts in some of the generated images outputs/sample/Week1_22141/B02/s1/[email protected]

I guess my config file share here might lead to these single-cell images. Any suggestions how I can rectify it?

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

vasudev-sharma · 2022-06-22T01:57:17Z

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset?

Arkkienkeli · 2022-06-22T13:37:50Z

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset?

The easiest workaround could be to use blank images as masks, it is ok if you are not going to use them (it will result in slightly increased size of images). I am going to fix it asap.

vasudev-sharma · 2022-06-22T14:56:57Z

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset?

The easiest workaround could be to use blank images as masks, it is ok if you are not going to use them (it will result in slightly increased size of images). I am going to fix it asap.

Thanks a lot for suggesting an alternate quick-fix. For now, I will wait for a fix of this issue.

Arkkienkeli · 2022-09-19T13:43:57Z

It was fixed by #327.

Arkkienkeli mentioned this issue Jun 24, 2022

Fix the condition for mask export. #327

Merged

Arkkienkeli closed this as completed Sep 19, 2022

jasperhyp mentioned this issue Jan 2, 2025

Request of clarifications about processed data in cpg0019 #360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Does DeepProfiler allow single-cell dataset generation? #326

Question: Does DeepProfiler allow single-cell dataset generation? #326

vasudev-sharma commented Jun 20, 2022 •

edited

Loading

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022

infominer commented Jun 21, 2022

infominer commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022 •

edited

Loading

infominer commented Jun 21, 2022

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022 •

edited

Loading

infominer commented Jun 21, 2022

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 22, 2022 •

edited

Loading

Arkkienkeli commented Jun 22, 2022

vasudev-sharma commented Jun 22, 2022

Arkkienkeli commented Sep 19, 2022

Question: Does DeepProfiler allow single-cell dataset generation? #326

Question: Does DeepProfiler allow single-cell dataset generation? #326

Comments

vasudev-sharma commented Jun 20, 2022 • edited Loading

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022

infominer commented Jun 21, 2022

infominer commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022 • edited Loading

infominer commented Jun 21, 2022

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 21, 2022 • edited Loading

infominer commented Jun 21, 2022

Arkkienkeli commented Jun 21, 2022

vasudev-sharma commented Jun 22, 2022 • edited Loading

Arkkienkeli commented Jun 22, 2022

vasudev-sharma commented Jun 22, 2022

Arkkienkeli commented Sep 19, 2022

vasudev-sharma commented Jun 20, 2022 •

edited

Loading

vasudev-sharma commented Jun 21, 2022 •

edited

Loading

vasudev-sharma commented Jun 21, 2022 •

edited

Loading

vasudev-sharma commented Jun 22, 2022 •

edited

Loading