Gregory Way, 2020
In this module, we present our pipeline for generating image-based profiles from Cell Painting data. We also process Cell Health readouts.
We primarily use the pycytominer tool for data processing.
See generate_profiles.py
for a complete description of our data processing pipeline.
Briefly, our pipeline is as follows:
Step | Notes |
---|---|
Aggregate single cells | Operation: median |
Annotate profiles | Merge platemaps with metadata |
Normalize profiles | Operation: mad_robustize; using only EMPTY control wells |
Feature select profiles | Operations: drop_na_columns, blacklist, variance_threshold, drop_outliers |
Audit profiles | Determine quality of the data by pairwise replicate correlations |
We also normalize the output cell health readouts from the Cell Health assay. We simply take the z-score across features.
We acquire consensus signatures for both Cell Painting and Cell Health assay readouts. We generate two different types of consensus signatures: moderated z score (MODZ) and median consensus.
We use the MODZ operation in all downstream applications and interpretations. MODZ was first introduced in Subramanian et al., 2017 and we use the pycytominer implementation.
This procedure results in a total of 357 profiles with matched Cell Painting and Cell Health data.
To reprocess the profiles, execute the following command:
# Activate environment
conda activate cell-health
# Perform full profiling pipeline
# Note that step 6 of this pipeline is not currently executed,
# since raw images are required and not included in this repo.
python profile-pipeline.sh