Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Does DeepProfiler allow single-cell dataset generation? #326

Closed
vasudev-sharma opened this issue Jun 20, 2022 · 16 comments
Closed

Comments

@vasudev-sharma
Copy link

vasudev-sharma commented Jun 20, 2022

I am new to DeepProfiler and was interested in single-cell image dataset generation. In particular, for single-cell BBBC021 dataset generation, I was wondering given that I have BBBC021 dataset as mentioned in the project structure of Wiki, is there a script/workflow in DeepProfiler which generates single-cell images?

I went through the wiki and the readme, but was unable to find any useful information in that regards.

Any help would be really appreciated.

@Arkkienkeli
Copy link
Member

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.

The example is the following:

python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config:
"mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

@vasudev-sharma
Copy link
Author

Thanks @Arkkienkeli for your prompt response, it is really appreciated.

As you said, I ran the script but encountered with key error (KeyError: 'training' ) in the config file (config-resnet.json)). Here are the logs for your information.

➡️ CLI command: python deepprofiler --root=/mnt/bh1/scratch/vasudev.sharma/data/single-cell-bbbc021 --config=config-resnet.json --metadata=index.csv --single-cells=sample --gpu 0 export-sc

➡️ bbbc021 is the public dataset available at aws s3 ls --no-sign-request s3://cellpainting-gallery/cpg0010-caie-drugresponse/broad-az/workspace/deep_learning/

➡️ LOGS:

(deepprofiler_env) vasudev.sharma@bh-login001:~/projects/DeepProfiler$ python deepprofiler --root=/mnt/bh1/scratch/vasudev.sharma/data/bbbc021 --config=config-resnet.json --metadata=index.csv --single-cells=sample --gpu 0 export-sc
2022-06-21 11:23:24.196506: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2022-06-21 11:23:28,147 - WARNING - From /mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.7.0 and strictly below 2.10.0 (nightly versions are not supported). 
 The versions of TensorFlow you are currently using is 2.5.3 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  warnings.warn(
Reading metadata form /mnt/bh1/scratch/vasudev.sharma/data/single-cell-bbbc021/inputs/metadata/index.csv
/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/metadata.py:38: FutureWarning: In a future version of pandas all arguments of read_csv except for the argument 'filepath_or_buffer' will be keyword-only.
  self.loadSingle(filename, delimiter, dtype)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3843 entries, 0 to 3842
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      3843 non-null   int64  
 1   Metadata_Plate          3843 non-null   object 
 2   Metadata_Well           3843 non-null   object 
 3   Metadata_Site           3843 non-null   object 
 4   Plate_Map_Name          3843 non-null   object 
 5   DNA                     3843 non-null   object 
 6   Tubulin                 3843 non-null   object 
 7   Actin                   3843 non-null   object 
 8   Replicate               3843 non-null   int64  
 9   Compound_Concentration  3843 non-null   object 
 10  compound                3843 non-null   object 
 11  concentration           3843 non-null   float64
 12  moa                     3843 non-null   object 
 13  replicate_use           3843 non-null   object 
dtypes: float64(1), int64(2), object(11)
memory usage: 420.5+ KB
None
Traceback (most recent call last):
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/__main__.py", line 216, in <module>
    cli(obj={})
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/mnt/ps/home/CORP/vasudev.sharma/.pyenv/versions/deepprofiler_env/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/__main__.py", line 156, in export_sc
    dset = deepprofiler.dataset.image_dataset.read_dataset(context.obj["config"])
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 219, in read_dataset
    metadata.splitMetadata(trainingFilter, validationFilter)
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/metadata.py", line 68, in splitMetadata
    self.train = self.data[trainingRule(self.data)].copy()
  File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda>
    trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"])
KeyError: 'training'

config-resnet.json file I am using

{
    "dataset": {
        "metadata": {
            "label_field": "Compound_Concentration",
            "control_value": "DMSO_0.0"
        },
        "images": {
            "channels": [
                "DNA",
                "Tubulin",
                "Actin"
              ],
            "file_format": "tif",
            "bits": 16,
            "width": 1280,
            "height": 1024
        }
    },
    "prepare": {
        "illumination_correction": {
            "down_scale_factor": 4,
            "median_filter_size": 24
        },
        "compression": {
            "implement": false,
            "scaling_factor": 1.0
        }
    },
    "train": {           
        "partition": {
            "targets": [
                "Compound_Concentration"
            ],
            "split_field": "replicate_use",
            "training_values": ["Training"],
            "validation_values": ["Validation", "None"]
        },
        "model": {
            "name": "resnet50",
            "crop_generator": "crop_generator",
            "metrics": ["accuracy", "top_k"],
            "epochs": 200,
            "params": {
                "learning_rate": 0.01,
                "batch_size": 32,
                "conv_blocks": 50,
                "feature_dim": 256,
                "pooling": "None"
            },
            "lr_schedule": "cosine",
            "backup_schedule":{
                "epoch":[40,80],
                "lr":[0.0005, 0.0001]
            }
        },
        "sampling": {
            "factor": 1.0,
            "box_size": 96,
            "mask_objects": false,
            "queue_size": 10000,
            "workers": 4
        },
        "validation": {
            "top_k": 5,
            "batch_size": 32,
            "frame": "val",
            "sample_first_crops": true
        },
        "comet_ml": {
            "track": true,
            "api_key": "guPdvYt7wQSIQHBE7szQxTgrd",
            "project_name": "bbbc021"
        }

    },
    "profile": {
      "pretrained": false,
      "feature_layer": "res4b1_relu",
      "checkpoint": "checkpoint_0100.hdf5",
      "batch_size": 8
    }
}
  

The json file doesn't have key "training" as pointed out by the error log. Any clues/ideas what I might be doing wrong?

Thanks @Arkkienkeli for the help.

@infominer
Copy link

@vasudev-sharma See the section "train": { in the config file that has the following parameters
"train": { "partition": { "targets": [ "Compound_Concentration" ], "split_field": "replicate_use", "training_values": ["Training"], "validation_values": ["Validation", "None"]

The bug you encountered File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda> trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"])
means it's looking for key parameter training, but your config file has training_values. Change training_values to training and also maybe go ahead and change validation_values to validation. Check the code in image_dataset.py and see what it's lookinf for.

@infominer
Copy link

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.

The example is the following:

python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: "mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

Hey @Arkkienkeli I came across this issue while searching for how to determine what box_size to use in the config. Is there a reference fo this?

@vasudev-sharma
Copy link
Author

@vasudev-sharma See the section "train": { in the config file that has the following parameters "train": { "partition": { "targets": [ "Compound_Concentration" ], "split_field": "replicate_use", "training_values": ["Training"], "validation_values": ["Validation", "None"]

The bug you encountered File "/mnt/ps/home/CORP/vasudev.sharma/projects/DeepProfiler/deepprofiler/dataset/image_dataset.py", line 217, in <lambda> trainingFilter = lambda df: df[split_field].isin(config["train"]["partition"]["training"]) means it's looking for key parameter training, but your config file has training_values. Change training_values to training and also maybe go ahead and change validation_values to validation. Check the code in image_dataset.py and see what it's lookinf for.

Thanks for the help @infominer!!

@Arkkienkeli
Copy link
Member

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.
The example is the following:

python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: "mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

Hey @Arkkienkeli I came across this issue while searching for how to determine what box_size to use in the config. Is there a reference fo this?

Hi @infominer, it depends on the size of the cells in your dataset. We usually recommend the size of 128 for BBBC021 dataset, but for custom dataset it might differ. I would recommend to choose box_size from 96, 128, 160 or 192. If the cells are small, probably you could resize them on your own after export (or before export, but correct the locations coordinates accordingly).

@vasudev-sharma
Copy link
Author

vasudev-sharma commented Jun 21, 2022

I am able to generate single-cell images with bounding box size 96. To be sure I am doing it correctly, would it be possible to have a second look at my config.json file? Thanks for the help in advance!

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config:
"mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

I am just interested in cropping the regions, so I believe I have to set "mask_objects": false and I guess not add "outlines" in my config, correct?

Here is my config file 👇

{
    "dataset": {
        "metadata": {
            "label_field": "Compound_Concentration",
            "control_value": "DMSO_0.0"
        },
        "images": {
            "channels": [
                "DNA",
                "Tubulin",
                "Actin"
              ],
            "file_format": "tif",
            "bits": 16,
            "width": 1280,
            "height": 1024
        },
        "locations":   {
            "mode": "single_cells",
            "box_size": 96,
            "mask_objects": false
        }
    },
    "prepare": {
        "illumination_correction": {
            "down_scale_factor": 4,
            "median_filter_size": 24
        },
        "compression": {
            "implement": false,
            "scaling_factor": 1.0
        }
    },
    "train": {           
        "partition": {
            "targets": [
                "Compound_Concentration"
            ],
            "split_field": "replicate_use",
            "training": ["Training"],
            "validation": ["Validation", "None"]
        },
        "model": {
            "name": "resnet50",
            "crop_generator": "crop_generator",
            "metrics": ["accuracy", "top_k"],
            "epochs": 200,
            "params": {
                "learning_rate": 0.01,
                "batch_size": 32,
                "conv_blocks": 50,
                "feature_dim": 256,
                "pooling": "None"
            },
            "lr_schedule": "cosine",
            "backup_schedule":{
                "epoch":[40,80],
                "lr":[0.0005, 0.0001]
            }
        },
        "sampling": {
            "factor": 1.0,
            "box_size": 96,
            "mask_objects": false,
            "queue_size": 10000,
            "workers": 4,
            "cache_size": 64
        },
        "validation": {
            "top_k": 5,
            "batch_size": 32,
            "frame": "val",
            "sample_first_crops": true
        },
        "comet_ml": {
            "track": true,
            "api_key": "guPdvYt7wQSIQHBE7szQxTgrd",
            "project_name": "bbbc021"
        }

    },
    "profile": {
      "pretrained": false,
      "feature_layer": "res4b1_relu",
      "checkpoint": "checkpoint_0100.hdf5",
      "batch_size": 8
    }
}

@infominer
Copy link

Hello @vasudev-sharma, you can use DeepProfiler for exporting the images of single-cells, unfortunately it is not covered in the current wiki. To do so, the location files are needed (X-Y coordinates of the center of the cell (crop)). In the config file for export the box_size parameter will define the size of the crop.
The example is the following:

python deepprofiler --root=/path/deepprofiler_project/ --config=config.json --metadata=metadata.csv --single-cells=sample --gpu 0 export-sc

Do you plan to use masks of the cells or you would like to just crop the regions? If yes, in the config: "mask_objects": true (in locations section of the config) and "outlines": "available" (in the prepare section of the config).

Hey @Arkkienkeli I came across this issue while searching for how to determine what box_size to use in the config. Is there a reference fo this?

Hi @infominer, it depends on the size of the cells in your dataset. We usually recommend the size of 128 for BBBC021 dataset, but for custom dataset it might differ. I would recommend to choose box_size from 96, 128, 160 or 192. If the cells are small, probably you could resize them on your own after export (or before export, but correct the locations coordinates accordingly).

Thanks! I am new to this field, and using images from a cell painting assay. We did a check on cells in which the conditions of the assay were slightly different and used Columbus. The Star Radial Mean (Nuclei Selected) measure was higher, and we have XY locations from Columbus as well. So going by your suggestion I should use a larger box size. What's the best metric to see if I am on the right path?

@Arkkienkeli
Copy link
Member

@infominer For some datasets that we use, the locations are derived with CellProfiler. As an alternative I also used Cellpose segmentation method and then used the centers of the obtained segmentation masks. Please note that XY coordinates in locations for DeepProfiler are centers of the crop.
To measure sizes of the cells, personally I would go with skimage pythin library, which has measure region properties methods. Then get a distribution of axis_major_length and decide on the box_size.
Maybe something similar can be done in Columbus?

@vasudev-sharma
Copy link
Author

vasudev-sharma commented Jun 21, 2022

Is it normal to have the last channel of the single-cell images look like this (almost black pixels)?

For instance, the generated single-cell image directory (outputs/sample/Week1_22141/B02/s1/) here are some reference images for the described behaviour.

image

image

This behaviour is common for all the generated single-cell images and not particular for this specific site / Well / Experiment.

Also, I am noticing some weird artifacts in some of the generated images
outputs/sample/Week1_22141/B02/s1/[email protected]
image

I guess my config file share here might lead to these single-cell images. Any suggestions how I can rectify it?

@infominer
Copy link

@Arkkienkeli Thanks for those pointers! Looks like columbus will also give me those measures and it's average bounding size is around 56 for one of the datasets, so I think I will start with 96 and evaluate the results.

@Arkkienkeli
Copy link
Member

Is it normal to have the last channel of the single-cell images look like this (almost black pixels)?

For instance, the generated single-cell image directory (outputs/sample/Week1_22141/B02/s1/) here are some reference images for the described behaviour.

image image

This behaviour is common for all the generated single-cell images and not particular for this specific site / Well / Experiment.

Also, I am noticing some weird artifacts in some of the generated images outputs/sample/Week1_22141/B02/s1/[email protected] image

I guess my config file share here might lead to these single-cell images. Any suggestions how I can rectify it?

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

@vasudev-sharma
Copy link
Author

vasudev-sharma commented Jun 22, 2022

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset?

@Arkkienkeli
Copy link
Member

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset?

The easiest workaround could be to use blank images as masks, it is ok if you are not going to use them (it will result in slightly increased size of images). I am going to fix it asap.

@vasudev-sharma
Copy link
Author

@vasudev-sharma I could reproduce it (also on another dataset), it is a bug. If the masks are availible and used for export, everything works as expected, if not, last channel is empty (as you observe). Thanks for reporting!

Thanks @Arkkienkeli for investigating into the issue. Is there any quick-fix for this (which I can do on my side) or expected turnaround around this issue as unfortunately I don't have access to mask for BBBC021 dataset?

The easiest workaround could be to use blank images as masks, it is ok if you are not going to use them (it will result in slightly increased size of images). I am going to fix it asap.

Thanks a lot for suggesting an alternate quick-fix. For now, I will wait for a fix of this issue.

@Arkkienkeli
Copy link
Member

It was fixed by #327.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants