BacWGSpipe

A Snakemake workflow for a complete analysis of bacterial whole-genome sequencing data.

Data Sources

BacWGSpipe supports three types of WGS data sources:

Short: Illumina short reads
Long: Long reads (either PacBio or Nanopore)
Hybrid: Illumina short reads + Long reads (either PacBio or Nanopore) from the same isolate
- This hybrid mode enables us to get complete assembly sequences of both chromosome and plasmid.

Workflow Structure

Dependencies

To simplify the installation step as much as possible, most of the dependencies including databases and tools have been pre-installed in several docker images, except for the following two databases due to the large file size. Please manually download the database files and modify the corresponding path in config file.

Kraken2 database:
- Download PlusPF database from https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_20230314.tar.gz and decompress
- Modify "kraken2_db" in config file (default: config/config.yaml)
EggNOG-mapper database:
- Read "Installation - Setup" section in https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#user-content-Overview
- Modify "eggnog_mapper_database" in config file (default: config/config.yaml)

Workflow Usage

Clone the repository:

git clone [email protected]:KevinLYW366/BacWGSpipe.git

Put all samples' sequencing data in a directory (long reads and short reads sequencing data separately) and generate a list file of sample names.
Modify following items in Config file (default: config/config.yaml) based on project information every time before running the workflow.
- 3.1 everything in "Data input" section;
  - Note: "seqdata_source" should match the data source you would input to the workflow. If the mode of Illumina short reads ONLY was selected, nothing in "Long reads" needs to be modified, and vice versa.
- 3.2 "threads" in "Analysis - Global" section (Threads used by tools for each sample);
- 3.3 feel free to modify some analysis thresholds if you know what they mean, such as "plsdb_max_pvalue", "plasmidfinder_mincov" and ... in "Analysis" section.

Set up the snakemake environment:

# Add conda channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
  
# Install mamba first (mamba provides a faster and more roboust way for conda packages installation)
conda install -n base -c conda-forge mamba
  
# Install snakemake using mamba
mamba create -c conda-forge -c bioconda -n snakemake snakemake=7.25.3

Set up the singularity environment (please refer to https://docs.sylabs.io/guides/4.0/user-guide/quick_start.html#quick-installation-steps). Snakemake uses Singularity to download and load the docker images (defined in "Images" section in config file) which contain workflow dependencies.

Run the whole pipeline using one bash script (BacWGSpipe.sh):

# Move to the directory of BacWGSpipe
cd /path/to/BacWGSpipe

# Activate the conda environment of Snakemake
conda activate snakemake

# Run BacWGSpipe
# Note: 1. Don't forget to bind the path of directories containing files which will be used when BacWGSpipe runs, 
#       including BacWGSpipe workflow, sequencing data, two databases manually downloaded and etc. 
#       2. Please use the parameter "--singularity-args" to bind path, e.g. --singularity-args "-B /lvdata/lyw:/lvdata/lyw" 
#       (then all the files in /lvdata/lyw will be available in Singularity container). Multiple paths could be binded using
#       several "-B"s.
bash BacWGSpipe.sh --configfile config/config.yaml -r -p --cores 32 --singularity-args "-B /lvdata/lyw:/lvdata/lyw" -n

Test Dataset

V. Murigneux et al., “MicroPIPE: Validating an end-to-end workflow for high-quality complete bacterial genome construction,” BMC genomics, vol. 22, no. 1, p. 474, 2021.

12 ST131 Escherichia coli strains with both Nanopore long-read sequencing data (SRA accession: SRP293329) and Illumina short-read sequencing data (SRA accession: ERP001354).

Author

Yewei Lu ([email protected])

Xiangchen Li ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
summary_report_demo		summary_report_demo
workflow		workflow
.gitignore		.gitignore
BacWGSpipe.sh		BacWGSpipe.sh
BacWGSpipe_workflow.png		BacWGSpipe_workflow.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BacWGSpipe

Data Sources

Workflow Structure

Dependencies

Workflow Usage

Test Dataset

Author

About

Releases

Packages

Languages

License

KevinLYW366/BacWGSpipe

Folders and files

Latest commit

History

Repository files navigation

BacWGSpipe

Data Sources

Workflow Structure

Dependencies

Workflow Usage

Test Dataset

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages