Download this repository to get the files describing the pipeline rules:
wget https://github.com/arthurvinx/Medusa/archive/refs/heads/main.zip
unzip main.zip
cd Medusa-main/
Go to the folder containing the Snakefile (via command line) and create the expected folder structure with:
mkdir -p ./Pipeline/{result,data/{merged,assembled,collapsed,removal/{index,reference},raw,trimmed},alignment/{db,index},taxonomic/db,functional/db}
An Anaconda environment was created to ease the installation of the software required by this pipeline. The environment recipe is available at the Anaconda cloud: https://anaconda.org/arthurvinx/medusapipeline.
A conda package manager, such as miniconda, must be installed to get this environment and to install the Snakemake workflow management system.
Download and install the latest miniconda release for Linux (adapt the commands if needed):
cd ~/Downloads
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh
chmod u+x Miniconda3-py39_4.9.2-Linux-x86_64.sh
./Miniconda3-py39_4.9.2-Linux-x86_64.sh
Close the terminal after the installation, open it again, and then check if the installation was successful with:
conda -V
Get the pipeline environment from the Anaconda cloud:
conda activate base
conda install anaconda-client -y
conda env create arthurvinx/medusaPipeline
conda activate medusaPipeline
pip3 install -U plyvel --no-cache-dir --no-deps --force-reinstall
The recommended way to install Snakemake is via the conda package manager. The following commands will create a conda environment with the full version of Snakemake. More details can be found at the Snakemake installation guide.
conda activate base
conda install -c conda-forge mamba -y
mamba create -c bioconda -c conda-forge -n snakemake snakemake
Check whether your installation succeeded by typing:
conda activate snakemake
snakemake --help
Move your raw FASTQ files to the inputDIR specified in the Snakefile. By default, the inputDIR is "Pipeline/data/raw" and paired-end filenames are expected to present the suffixes "_1.fastq" and "_2.fastq". Alternatively, you may change the inputDIR editing the Snakefile. It is worth to mention that all paths in the Snakefile are interpreted relative to the directory Snakemake is executed in.
To start this pipeline, go to the folder containing the Snakefile (via command line) and run:
snakemake --cores
This will use all available cores whenever possible. Alternatively, you may define the number of available cores as seen in the Snakemake command line interface guide.