Skip to content

Latest commit

 

History

History
232 lines (176 loc) · 9.93 KB

README.md

File metadata and controls

232 lines (176 loc) · 9.93 KB

Cellos: High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology

Overview

Cellos (Cell and Organoid Segmentation) is a pipeline developed to perform high-throughput volumetric 3D segmentation and morphological quantification of organoids and their cells. Cellos segments organoids using classical algorithms and segments nuclei using our trained model based on Stardist-3D (https://github.com/stardist/stardist).

Data description

The image data used here were exported from the PerkinElmer Opera Phenix high content screening confocal microscope. The resulting folder contains subfolders with tiff files (Images) and xml files (metadata). Each tiff file was a single image from one well, one field, one plane and one channel. We developed an automatic protocol that organized all tiff files from same well and saved them as zarr arrays to minimize RAM and storage. All information for the images are deconvoluted from the respective metadata files.

Installing the pipeline

Currently, the pipeline uses a Python 3.7 environment. We provide a defined requirements.txt to install all packages and dependencies for a working environment on Rocky 9 Linux.
We recommend creating a virtual environment for running the pipeline, for example using conda.

Installing the pipeline using conda to manage the Python version:

git clone https://github.com/TheJacksonLaboratory/Cellos.git
cd Cellos #(make sure you are in the correct directory)
conda create -f environment.yml

This will use conda to create a Python 3.7 environment and then install all packages from PyPI using pip and the requirements.txt file.

If you prefer to install the pipeline dependencies into a pre-existing Python 3.7 environment (e.g. venv), you can use:

pip install --require-hashes --no-deps -r requirements.txt

This will ensure you install the exact packages that we've tested.

Note

  • At present we've tested the pipeline only on Centos 7 and Rocky 9 Linux and using Python 3.7.
  • The provided environment does not include additional packages required for specific GPU support, e.g. CUDA.

Running the pipeline

There are two main steps to run the pipeline:

  1. Organanizing images and organoids segmentation.
  2. Nuclei segmentation

Each of these can be run on an individual well using a plain bash script or as an sbatch script. To run on a whole plate, the script uses sbatch to launch jobs on a SLURM HPC cluster. The sbatch settings have been optimized using the sample data set and the JAX Sumner2 cluster.

Important

If you are running this pipeline on Sumner2, be aware that the scheduler is merciless and will kill your job if it exceeds the requested memory.
The two sbatch scripts, scripts/process_organoids/stitch_well.sh and scripts/process_cells/cells_seg_well.sh, have ~25% memory headroom, based on the sample data, but if your jobs are killed you will want to edit them to increase the requested memory.

The process for running organizing images and organoids segmentation steps

Important

If you are using a virtual environment, ensure you have it activated!
For example, using conda as recommended, do:

conda activate organoid

Otherwise, provice the path to your Python 3.7 interpreter in the PYTHONPATH variable. You may also need to ensure the scripts are executable using:

chmod u+x <script name>
  • For a single well--this takes ~2 hours wall-time and uses ~128G of memory.
    From an interactive session, using bash:

    cd scripts/process_organoids/
    PYTHONPATH=$(which python) bash stitch_well.sh -r <row number> -c <column number> -f ../../config.example.cfg

    As a SLURM job using sbatch (requests: 2 cores, 160G memory):

    cd scripts/process_organoids/
    PYTHONPATH=$(which python) sbatch stitch_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
  • For a whole plate--this submits a series of the above as SLURM jobs using sbatch:

    cd scripts/process_organoids/
    PYTHONPATH=$(which python) bash process_plate.sh -f ../../config.example.cfg 

The process for running nuclei segmentation steps:

  • For a single well--this takes <20 min wall-time with 8 cores and uses ~6G of memory.
    From an interactive session, using bash:

    cd scripts/process_cells/
    PYTHONPATH=$(which python) bash cells_seg_well.sh -r <row number> -c <column number> -f ../../config.example.cfg

    As a SLURM job using sbatch (requests: 8 cores, 10G of memory):

    cd scripts/process_cells/
    PYTHONPATH=$(which python) sbatch cells_seg_well.sh -r <row number> -c <column number> -f ../../config.example.cfg

    For a whole plate--this submits a series of the above as SLURM jobs using sbatch:

    PYTHONPATH=$(which python) bash cells_process_plate.sh -f ../../config.example.cfg

Note

All of the above commands are using ../../config.example.cfg as the location of the config file, because of the layout of this repository. You can provide an absolute path to another location.

The configuration file

The pipeline requires certain key parameters to be provided. For this we use a simple INI style plain-text file that can be parsed with the configparser module.
In the repository we provide an example configuration file, config.example.cfg.

Parameter Description
[pipeline]
plate_path path to where your raw images are
output_path path to where the csv files and zarr arrays will be saved
well_targets name number of rows and columns (row1,col1|row2,col2) of wells to analyze
[stitch_well]
plane_size size of image of one field, one z-slice and one channel
overlap_x overlapping pixels between two adjacent fields
overlap_y overlapping pixels between two adjacent fields
[cells_seg_well]
output_path path to where the csv files will be saved
stardist_path path to the trained model for nuclei segmentation

Note

The paths can be relative to the scripts, as is in the example provided here, which assumes the layout of the repository is fixed. Otherwise, the paths should be provided as absolute paths.

Usage

We have made an example dataset with one well data publicly available. The well row number=3 and column number=7. The image has 3 channels, channel1=EGFP, channel2=mCherry and channel3=brightfield.

It can be downloaded from: https://figshare.com/articles/dataset/cellos_data_zip/21992234

On Linux, you can download it as follows:

wget https://figshare.com/ndownloader/files/39032216

Warning

This is a ~11Gb zip file.

On Linux, it needs to be unziped using 7z:

7z x 39032216

This will extract a cellos_data folder consisting of images (.tiff) and Index.idx.xml (metadata) file.

To use the provided config.example.cfg and script commands from above, we recommend you place the cellos_data in the root of this repository.

You should obtain the following layout for the Cellos directory, where ... indicates abridged files:

├── config.example.cfg
├ ...
├── cellos_data
│   └── Index.idx.xml
|   └── r03c07 ... .tiff
├── models
│   └── stardist
│       ├ ...
├── output
│   ├ ...
└── scripts
    ├── process_cells
    │   ├── cells_process_plate.sh
    │   ├── cells_seg_well.py
    │   └── cells_seg_well.sh
    └── process_organoids
        ├── process_plate.sh
        ├── stitch_well.py
        └── stitch_well.sh

Important

We provide the expected results for running the pipeline on the sample data in the output folder in the root of the repository. If you plan on running the pipeline on the sample data, we recommend you backup or rename this folder such that you can compare your results with ours. Alternately, you can change the output_path paramters in the .cfg file.

Assuming the above layout, you can use the provided config.example.cfg and run the pipeline in two steps:

Important

  • If you are using an interactive session, ensure you have enough memory!
  • Ensure you have activated your virtual environment, e.g:
conda activate organoid
  1. Organize images and segment organoids (this takes ~2 hours using the sbatch script)
    From the Cellos directory (root of the repository) cd into the proper scripts directory:

    cd scripts/process_organoids

    Run the first step using bash (an interactive session):

    PYTHONPATH=$(which python) bash stitch_well.sh -r 3 -c 7 -f ../../config.example.cfg

    Alternately, run the first step as a SLURM job using sbatch (requests: 2 cores, 160G memory):

    PYTHONPATH=$(which python) sbatch stitch_well.sh -r 3 -c 7 -f ../../config.example.cfg
  2. Segment cells (this takes <20 min using the sbatch script)
    From the Cellos directory (root of the repository) cd into the proper scripts directory:

    cd scripts/process_cells

    Run the second step using bash (an interactive session):

    PYTHONPATH=$(which python) bash cells_seg_well.sh -r 3 -c 7 -f ../../config.example.cfg

    Alternately, run the second step as a SLURM job using sbatch (requests: 8 cores, 10G of memory):

    PYTHONPATH=$(which python) sbatch cells_seg_well.sh -r 3 -c 7 -f ../../config.example.cfg