Cellos: High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology
Cellos (Cell and Organoid Segmentation) is a pipeline developed to perform high-throughput volumetric 3D segmentation and morphological quantification of organoids and their cells. Cellos segments organoids using classical algorithms and segments nuclei using our trained model based on Stardist-3D (https://github.com/stardist/stardist).
The image data used here were exported from the PerkinElmer Opera Phenix high content screening confocal microscope. The resulting folder contains subfolders with tiff files (Images) and xml files (metadata). Each tiff file was a single image from one well, one field, one plane and one channel. We developed an automatic protocol that organized all tiff files from same well and saved them as zarr arrays to minimize RAM and storage. All information for the images are deconvoluted from the respective metadata files.
Currently, the pipeline uses a Python 3.7 environment. We provide a defined requirements.txt
to install all packages and dependencies for a working environment on Rocky 9 Linux.
We recommend creating a virtual environment for running the pipeline, for example using conda
.
Installing the pipeline using conda
to manage the Python version:
git clone https://github.com/TheJacksonLaboratory/Cellos.git
cd Cellos #(make sure you are in the correct directory)
conda create -f environment.yml
This will use conda
to create a Python 3.7 environment and then install all packages from PyPI using pip
and the requirements.txt
file.
If you prefer to install the pipeline dependencies into a pre-existing Python 3.7 environment (e.g. venv
), you can use:
pip install --require-hashes --no-deps -r requirements.txt
This will ensure you install the exact packages that we've tested.
Note
- At present we've tested the pipeline only on Centos 7 and Rocky 9 Linux and using Python 3.7.
- The provided environment does not include additional packages required for specific GPU support, e.g. CUDA.
There are two main steps to run the pipeline:
- Organanizing images and organoids segmentation.
- Nuclei segmentation
Each of these can be run on an individual well using a plain bash
script or as an sbatch
script. To run on a whole plate, the script uses sbatch
to launch jobs on a SLURM HPC cluster. The sbatch
settings have been optimized using the sample data set and the JAX Sumner2 cluster.
Important
If you are running this pipeline on Sumner2, be aware that the scheduler is merciless and will kill your job if it exceeds the requested memory.
The two sbatch
scripts, scripts/process_organoids/stitch_well.sh
and scripts/process_cells/cells_seg_well.sh
, have ~25% memory headroom, based on the sample data, but if your jobs are killed you will want to edit them to increase the requested memory.
Important
If you are using a virtual environment, ensure you have it activated!
For example, using conda
as recommended, do:
conda activate organoid
Otherwise, provice the path to your Python 3.7 interpreter in the PYTHONPATH
variable.
You may also need to ensure the scripts are executable using:
chmod u+x <script name>
-
For a single well--this takes ~2 hours wall-time and uses ~128G of memory.
From an interactive session, usingbash
:cd scripts/process_organoids/ PYTHONPATH=$(which python) bash stitch_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
As a SLURM job using
sbatch
(requests: 2 cores, 160G memory):cd scripts/process_organoids/ PYTHONPATH=$(which python) sbatch stitch_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
-
For a whole plate--this submits a series of the above as SLURM jobs using
sbatch
:cd scripts/process_organoids/ PYTHONPATH=$(which python) bash process_plate.sh -f ../../config.example.cfg
-
For a single well--this takes <20 min wall-time with 8 cores and uses ~6G of memory.
From an interactive session, usingbash
:cd scripts/process_cells/ PYTHONPATH=$(which python) bash cells_seg_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
As a SLURM job using
sbatch
(requests: 8 cores, 10G of memory):cd scripts/process_cells/ PYTHONPATH=$(which python) sbatch cells_seg_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
For a whole plate--this submits a series of the above as SLURM jobs using
sbatch
:PYTHONPATH=$(which python) bash cells_process_plate.sh -f ../../config.example.cfg
Note
All of the above commands are using ../../config.example.cfg
as the location of the config file, because of the layout of this repository. You can provide an absolute path to another location.
The pipeline requires certain key parameters to be provided. For this we use a simple INI style plain-text file that can be parsed with the configparser
module.
In the repository we provide an example configuration file, config.example.cfg
.
Parameter | Description |
---|---|
[pipeline] |
|
plate_path | path to where your raw images are |
output_path | path to where the csv files and zarr arrays will be saved |
well_targets | name number of rows and columns (row1,col1|row2,col2) of wells to analyze |
[stitch_well] |
|
plane_size | size of image of one field, one z-slice and one channel |
overlap_x | overlapping pixels between two adjacent fields |
overlap_y | overlapping pixels between two adjacent fields |
[cells_seg_well] |
|
output_path | path to where the csv files will be saved |
stardist_path | path to the trained model for nuclei segmentation |
Note
The paths can be relative to the scripts, as is in the example provided here, which assumes the layout of the repository is fixed. Otherwise, the paths should be provided as absolute paths.
We have made an example dataset with one well data publicly available. The well row number=3 and column number=7. The image has 3 channels, channel1=EGFP, channel2=mCherry and channel3=brightfield.
It can be downloaded from: https://figshare.com/articles/dataset/cellos_data_zip/21992234
On Linux, you can download it as follows:
wget https://figshare.com/ndownloader/files/39032216
Warning
This is a ~11Gb zip file.
On Linux, it needs to be unziped using 7z:
7z x 39032216
This will extract a cellos_data
folder consisting of images (.tiff
) and Index.idx.xml
(metadata) file.
To use the provided config.example.cfg
and script commands from above, we recommend you place the cellos_data
in the root of this repository.
You should obtain the following layout for the Cellos
directory, where ...
indicates abridged files:
├── config.example.cfg
├ ...
├── cellos_data
│ └── Index.idx.xml
| └── r03c07 ... .tiff
├── models
│ └── stardist
│ ├ ...
├── output
│ ├ ...
└── scripts
├── process_cells
│ ├── cells_process_plate.sh
│ ├── cells_seg_well.py
│ └── cells_seg_well.sh
└── process_organoids
├── process_plate.sh
├── stitch_well.py
└── stitch_well.sh
Important
We provide the expected results for running the pipeline on the sample data in the output
folder in the root of the repository.
If you plan on running the pipeline on the sample data, we recommend you backup or rename this folder such that you can compare your results with ours.
Alternately, you can change the output_path
paramters in the .cfg
file.
Assuming the above layout, you can use the provided config.example.cfg
and run the pipeline in two steps:
Important
- If you are using an interactive session, ensure you have enough memory!
- Ensure you have activated your virtual environment, e.g:
conda activate organoid
-
Organize images and segment organoids (this takes ~2 hours using the
sbatch
script)
From the Cellos directory (root of the repository)cd
into the proper scripts directory:cd scripts/process_organoids
Run the first step using
bash
(an interactive session):PYTHONPATH=$(which python) bash stitch_well.sh -r 3 -c 7 -f ../../config.example.cfg
Alternately, run the first step as a SLURM job using
sbatch
(requests: 2 cores, 160G memory):PYTHONPATH=$(which python) sbatch stitch_well.sh -r 3 -c 7 -f ../../config.example.cfg
-
Segment cells (this takes <20 min using the
sbatch
script)
From the Cellos directory (root of the repository)cd
into the proper scripts directory:cd scripts/process_cells
Run the second step using
bash
(an interactive session):PYTHONPATH=$(which python) bash cells_seg_well.sh -r 3 -c 7 -f ../../config.example.cfg
Alternately, run the second step as a SLURM job using
sbatch
(requests: 8 cores, 10G of memory):PYTHONPATH=$(which python) sbatch cells_seg_well.sh -r 3 -c 7 -f ../../config.example.cfg