nextstrain

Overview

The automated_nextstrain.sh script automates the daily execution of TGen's Nextstrain analysis pipeline. This script is configured to run as a cron job, processing GISAID data, sampling genomes, and building a Nextstrain tree for Arizona-specific sequences.

SLURM Directives

The script begins with SLURM job specifications:

#SBATCH --cpus-per-task=14
#SBATCH --mem=140G
#SBATCH --partition=data-mover

CPUs: Allocates 14 CPUs for the task.
Memory: Allocates 140 GB of RAM.
Partition: Runs on the data-mover partition.

Environment Setup

source /home/bvan-tassel/miniforge3/etc/profile.d/conda.sh
conda activate R

The script activates the R Conda environment, used later for genome sampling and other R-based tasks.

Directory Preparation

rm -r running_NS
cp -R ${builder} running_NS
cp -R ${config_alt}/${style}/config.json running_NS/config/config.json

Removes any existing running_NS directory.
Copies the nextstrain_weekly builder (containing the Snakemake workflow) into the running_NS directory.
Updates the Nextstrain configuration file with Arizona-specific settings.

Genome Sampling

Rscript sample_gisaid.R ${style}

This command uses an R script to sample genomes from the GISAID dataset based on a likelihood model, ensuring newer genomes are more likely to be included in the analysis.

Color Assignment

conda run -n nextstrain python auto_color_assign.py

Assigns colors to new lineages for tree visualization in Nextstrain. The color information is added to the configuration folder.

Tree Rooting

rooter=$(cat root.name)
sed -i -e "s/VttLJ2gz1X4B/${rooter:-oldest}/" running_NS/Snakefile

Retrieves the root name from the root.name file.
Updates the Snakemake file to use the correct root for the phylogenetic tree.

Linking Data Files

ln -s $( readlink -e subset_sequences.fasta ) running_NS/data/sequences.fasta
ln -s $( readlink -e subset_metadata.tsv ) running_NS/data/metadata.tsv

Creates symbolic links for the sampled sequences and metadata files, making them accessible to the Snakemake workflow.

Running the Nextstrain Workflow

conda activate nextstrain
export AUGUR_RECURSION_LIMIT=10000
cd running_NS
snakemake --cores 14

Activates the nextstrain environment.
Sets AUGUR_RECURSION_LIMIT to handle large datasets.
Executes the Snakemake workflow using 14 cores.

GitHub Update

python3 github_upload.py

Uploads the results to a GitHub repository. This includes metadata and color files used in the Nextstrain visualization.

Cron Job Integration

The script is scheduled to run daily using a cron job. Example cron entry:

40 1 * * * source /home/bvan-tassel/miniforge3/etc/profile.d/conda.sh && conda activate nextstrain; cd /tnorth_labs/COVIDseq/nextstrain/ && sh automated_nextstrain.sh | mail -s "Nextstrain update" [email protected]

This schedules the script to run at 1:40 AM every day.

Script Location

Ensure the script is stored in a directory accessible to the SLURM environment and cron job.

Dependencies

Conda Environments:
- R
- nextstrain
External Tools:
- Snakemake
- AUGUR

Output

The workflow generates:

A phylogenetic tree in Nextstrain format.
Updated metadata and lineage colors.
Results pushed to a designated GitHub repository.

https://nextstrain.org/community/tgennorth/arizona-covid-19/AZ?d=tree&onlyPanels&p=full&sidebar=closed

Notes

Ensure SLURM, Conda, and Python configurations are properly set up on your system.
Verify the paths to GISAID data files (seq and meta) and configuration files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nextstrain

Overview

SLURM Directives

Environment Setup

Directory Preparation

Genome Sampling

Color Assignment

Tree Rooting

Linking Data Files

Running the Nextstrain Workflow

GitHub Update

Cron Job Integration

Script Location

Dependencies

Output

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
nextstrain_weekly		nextstrain_weekly
.gitignore		.gitignore
README.md		README.md
auto_color_assign.py		auto_color_assign.py
automated_nextstrain.sh		automated_nextstrain.sh
github_upload.py		github_upload.py
root.name		root.name
run_snakemake.sh		run_snakemake.sh
sample_gisaid.R		sample_gisaid.R

brett-van-tussler/nextstrain

Folders and files

Latest commit

History

Repository files navigation

nextstrain

Overview

SLURM Directives

Environment Setup

Directory Preparation

Genome Sampling

Color Assignment

Tree Rooting

Linking Data Files

Running the Nextstrain Workflow

GitHub Update

Cron Job Integration

Script Location

Dependencies

Output

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages