This pipeline uses publically available modules from nf-core with some locally created modules. The primary functionality is to run a workflow on 10s - 1000s of samples in parallel on the Seattle Children's Cybertron HPC using the PBS job scheduler and containerized scientific software.
First, follow the steps on this page to make a personal copy of this repository. Then, the step-by-step instructions to run the workflow: workflow_docs/workflow_run.md
can be used.
This workflow is designed to output gene expression counts from STAR aligner using --quantmode
. It will also perform general QC statistics on the fastqs with fastqc and the alignment using rseqc. Finally, the QC reports are collected into a single file using multiQC.
A DAG (directed acyclic graph) of the workflow is show below:
First, fork the repository from Children’s bitbucket. Do this by clicking the “create fork” symbol from the bitbucket web interface and fork it to your personal bitbucket account, as illustrated below.
Next, you will need to clone your personal repository to your home in Cybertron. See the image below for where you can find the correct URL on your forked bitbucket repo.
Copy that URL to replace https://childrens-atlassian/bitbucket/scm/~jsmi26/rnaseq_count_nf.git
below.
# on a terminal on the Cybertron login nodes
cd ~
# your fork should have your own userID (rather than jsmi26)
git clone https://childrens-atlassian/bitbucket/scm/~MY_USERID/rnaseq_count_nf.git
cd ~/rnaseq_count_nf
Once inside the code repository directory, use the latest release branch or make sure you're using the same release as prior analysis by using git
.
git fetch
git branch -a
The git branch command will show all available remote branches, including remote branches, like:
* main
remotes/origin/HEAD -> origin/main
remotes/origin/dev
remotes/origin/main
remotes/origin/release/1.1.2
Checkout the most current release branch, which will be the largest value (eg use release/1.2.0
if avaiable). You can use the most up-to-date branch by using this command:
git checkout release/1.0.0
Which will state that you are now on release/1.0.0
branch and that it is tracking the release branch in your personal repository.
Checking out files: 100% (55/55), done. Branch release/1.0.0 set up to track remote branch release/1.0.0 from origin. Switched to a new branch 'release/1.0.0'
Find your project code by listing all your projects on the Cybertron terminal.
# lists all HPC project names that you have access to use
project info
Grab an interactive session compute node and activate the conda environment. It is also be best practice to use tmux
or screen
to ensure that if at the session is disconnected, then you’re nextflow workflow (if running) won’t end with SIGKILL error.
Change the QUEUE
and NAME
variables in the code chunk below to be accurate for your Cybertron projects.
tmux new-session -s nextflow
# the variable 'NAME' will be an HPC project that you have access to
NAME="RSC_adhoc"
QUEUE="paidq"
qsub -I -q $QUEUE -P $(project code $NAME) -l select=1:ncpus=1:mem=8g -l walltime=8:00:00
cd ~/rnaseq_count_nf
If you don’t have conda installed yet, please follow these directions. You may stop following the directions after the conda deactivate step.
Next, for the conda environment to be solved, you will need to set channel_priority to flexible in your conda configs as well. To read more about conda environments and thier configurations, check out the documentation.
# check config settings
conda config --describe channel_priority # print your current conda settings
conda config --set channel_priority flexible # set to flexible if not already done
# Create the environement only once. Skip this step if you've already created the environment
conda env create -f env/nextflow.yaml
# Activate the conda environment.
conda activate nextflow
SCRI uses a TLS and/or SSL Certificate to inspect web traffic and its specific to SCRI. Nextflow itself orchestrates many types of downloads such as genomic references, scientific software images from public repositories, and conda packages.
If you are running into SSL errors, you will need to configure your conda installation to use SCRI certificates.
Please see Research Scientific Computing for more help in getting set-up and this bitbucket repo for the current certificates.
Open the step-by-step instructions to run the workflow in workflow_docs/workflow_run.md
.
This pipeline was generated using nf-core tools
CLI suite and publically available modules from nf-core
.
The nf-core project came about at the start of 2018. Phil Ewels (@ewels) was the head of the development facility at NGI Stockholm (National Genomics Infrastructure), part of SciLifeLab in Sweden.
The NGI had been developing analysis pipelines for use with it’s genomics data for several years and started using a set of standards for each pipeline created. This helped other people run the pipelines on their own systems; typically Swedish research groups at first, but later on other groups and core genomics facilities too such as QBIC in Tübingen.
As the number of users and contributors grew, the pipelines began to outgrow the SciLifeLab and NGI branding. To try to open up the effort into a truly collaborative project, nf-core was created and all relevant pipelines moved to this new GitHub Organisation.
The early days of nf-core were greatly shaped by Alex Peltzer (@apeltzer), Sven Fillinger (@sven1103) and Andreas Wilm (@andreas-wilm). Without them, the project would not exist.