Skip to content

ciceklab/ECOLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5cb8e35 · Dec 6, 2024

History

27 Commits
Sep 25, 2022
Oct 30, 2023
Jul 31, 2023
Dec 6, 2024
Oct 30, 2023
Sep 25, 2022
Dec 6, 2024
Oct 30, 2022
Oct 30, 2023
Sep 25, 2022
Sep 25, 2022
Oct 30, 2023
Oct 30, 2023
Oct 30, 2023
Sep 25, 2022
Sep 25, 2022

Repository files navigation

ECOLE: Learning to call copy number variants on whole exome sequencing data

DOI

ECOLE is a deep learning based software that performs CNV call predictions on WES data using read depth sequences.

The manuscript can be found here: ECOLE: Learning to call copy number variants on whole exome sequencing data

The repository with the necessary data and scripts to reproduce the results in the paper can be found here: ECOLE results reproduction

Deep Learning, Copy Number Variation, Whole Exome Sequencing


Authors

Berk Mandiracioglu, Furkan Ozden, Gun Kaynar, M. Alper Yilmaz, Can Alkan, A. Ercument Cicek


Questions & comments

[firstauthorname].[firstauthorsurname]@gmail.com [lastauthorsurname]@cs.bilkent.edu.tr


Table of Contents

Warning: Please note that ECOLE software is completely free for academic usage. However it is licenced for commercial usage. Please first refer to the License section for more info.


Installation

  • ECOLE is a python3 script and it is easy to run after the required packages are installed.

Requirements

For easy requirement handling, you can use ECOLE_environment.yml files to initialize conda environment with requirements installed:

$ conda env create --name ecole_env -f ECOLE_environment.yml
$ conda activate ecole_env

Note that the provided environment yml file is for Linux systems. For MacOS users, the corresponding versions of the packages might need to be changed.

Features

  • ECOLE provides GPU support optionally. See GPU Support section.

Instructions Manual for ECOLE

Important notice: Please call the ECOLE_call.py script from the scripts directory.

Required Arguments

-m, --model

  • Pretrained models of the paper, one of the following: (1) ecole, (2) ecole-ft-expert, (3) ecole-ft-somatic.

-bs, --batch_size

  • Batch size to be used to perform CNV call on the samples.

-i, --input

  • Relative or direct path for are the processed WES samples, including read depth data.

-o, --output

  • Relative or direct output directory path to write ECOLE output file.

-c, --cnv

  • Level of resolution you desire, choose one of the options: (1) exonlevel, (2) merged.

-n, --normalize

  • Relative or direct path for mean&std stats of read depth values to normalize. These values are obtained precalculated from the training dataset before the pretraining.

Optional Arguments

-g, --gpu

  • Set to PCI BUS ID of the gpu in your system.
  • You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:

-conf, --confidenceThreshold

  • Confidence threshold for calling CNV labels.
  • Select higher values for more confident calls.

-v, --version

-Check the version of ECOLE.

-h, --help

-See help page.

Usage Example

Usage of ECOLE is very simple!

Step-0: Install conda package management

  • This project uses conda package management software to create virtual environment and facilitate reproducability.

  • For Linux users:

  • Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.

  • Replace this Anaconda3-version.num-Linux-x86_64.sh with your choice

$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh

Step-1: Set Up your environment.

  • It is important to set up the conda environment which includes the necessary dependencies.
  • Please run the following lines to create and activate the environment:
$ conda env create --name ecole_env -f ECOLE_environment.yml
$ conda activate ecole_env

Step-2: Run the preprocessing script.

  • It is necessary to perform preprocessing on WES data samples to obtain read depth and other meta data and make them ready for CNV calling.
  • Please run the following line:
$ source preprocess_samples.sh

Step-3: Run ECOLE on data obtained in Step-2

  • Here, we demonstrate an example to run ECOLE on gpu device 0, and obtain exon-level CNV call.
  • Please run the following script:
$ source ecole_call.sh

You can change the argument parameters within the script to run it on cpu and/or to obtain merged CNV calls.

Output file of ECOLE

  • At the end of the CNV calling procedure, ECOLE will write its output file to the directory given with -o option. In this tutorial it is ./ecole_calls_output
  • Output file of ECOLE is a tab-delimited .bed like format.
  • Columns in the output file of ECOLE are the following with order: 1. Sample Name, 2. Chromosome, 3. CNV Start Index, 4. CNV End Index, 5. ECOLE Prediction
  • Following figure is an example of ECOLE output file.

Instructions Manual for Finetuning ECOLE

Important notice: Please call the ECOLE_finetune.py script from the scripts directory.

Required Arguments

-bs, --batch_size

  • Batch size to be used to perform CNV call on the samples.

-i, --input

  • Relative or direct path for are the processed WES samples, including read depth data.

-o, --output

  • Relative or direct output directory path to write ECOLE output file.

-n, --normalize

  • Relative or direct path for mean&std stats of read depth values to normalize. These values are obtained precalculated from the training dataset before the pretraining.

-e, --epochs

  • The number of epochs the finetuning will be performed.

-lr, --learning_rate

  • The learning rate to be used in finetuning

-lmp, --load_model_path

  • The path for the pretrained model weights to be loaded for finetuning

Optional Arguments

-g, --gpu

  • Set to PCI BUS ID of the gpu in your system.
  • You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:

-v, --version

-Check the version of ECOLE.

-h, --help

-See help page.

Finetune Example

We provide an ECOLE Finetuning example with WES sample of NA12891 using only chromosome 21. Step-0 and Step-1 are the same as the ECOLE call example.

Step-0: Install conda package management

  • This project uses conda package management software to create virtual environment and facilitate reproducability.

  • For Linux users:

  • Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.

  • Replace this Anaconda3-version.num-Linux-x86_64.sh with your choice

$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh

Step-1: Set Up your environment.

  • It is important to set up the conda environment which includes the necessary dependencies.
  • Please run the following lines to create and activate the environment:
$ conda env create --name ecole_env -f ECOLE_environment.yml
$ conda activate ecole_env

Step-2: Run the preprocessing script for preparing the samples for finetuning.

  • It is necessary to perform preprocessing on WES data samples to obtain read depth and other meta data and make them ready for ECOLE finetuning.

  • ECOLE Finetuning requires .bam and ground truth calls as provided under /finetune_example_data. Please see the below image for a sample ground truths format.

  • Please run the following line:

$ source finetune_preprocess_samples.sh

Step-3: Start ECOLE Finetuning on data obtained in Step-2

  • Here, we demonstrate an example to run ECOLE Finetuning on gpu device 0.
  • Please run the following script:
$ source ecole_finetune.sh

You can change the argument parameters within the script to run it on cpu.

Output file of ECOLE

  • At the end of ECOLE Finetuning, the script will save its model weights file to the directory given with -o option. In this tutorial it is ./ecole_finetuned_model_weights


Citations


License

  • CC BY-NC-SA 2.0
  • Copyright 2022 © ECOLE.
  • For commercial usage, please contact.