Skip to content

Commit

Permalink
Dev (#14)
Browse files Browse the repository at this point in the history
* Updated ReadMe, minor update to wdl header

* Set docker variables with options, updated docker to GATK4.1

* added note to keep index files in the same directory as source file

*  Yossi Updated README.md

* added verson licensing info
  • Loading branch information
bshifaw authored Feb 22, 2019
1 parent 92b1157 commit 1b01988
Show file tree
Hide file tree
Showing 4 changed files with 72 additions and 52 deletions.
37 changes: 26 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,44 @@
Workflows for processing high-throughput sequencing data for variant discovery with GATK4 and related tools.

### processing-for-variant-discovery-gatk4 :
The processing-for-variant-discovery-gatk4 WDL pipeline implements data pre-processing according to the GATK Best Practices
(June 2016).
The processing-for-variant-discovery-gatk4 WDL pipeline implements data pre-processing according to the GATK Best Practices.

#### Requirements/expectations
#### Requirements/expectations:
- Pair-end sequencing data in unmapped BAM (uBAM) format
- One or more read groups, one per uBAM file, all belonging to a single sample (SM)
- Input uBAM files must additionally comply with the following requirements:
- filenames all have the same suffix (we use ".unmapped.bam")
- files must pass validation by ValidateSamFile
- reads are provided in query-sorted order
- all reads must have an RG tag
- Reference index files must be in the same directory as source (e.g. reference.fasta.fai in the same directory as reference.fasta)

#### Outputs
#### Outputs:
- A clean BAM file and its index, suitable for variant discovery analyses.

### Software version requirements :
- GATK 4 or later
- Picard 2.x
- Samtools (see gotc docker)
- BWA 0.7.15-r1140
- Picard 2.16.0-SNAPSHOT
- Samtools 1.3.1 (using htslib 1.3.1)
- Python 2.7
- Cromwell version support
- Successfully tested on v37
- Does not work on versions < v23 due to output syntax

### Important Note :
- The provided JSON is meant to be a ready to use example JSON template of the workflow. It is the user’s responsibility to correctly set the reference and resource input variables using the [GATK Tool and Tutorial Documentations](https://software.broadinstitute.org/gatk/documentation/).
- Relevant reference and resources bundles can be accessed in [Resource Bundle](https://software.broadinstitute.org/gatk/download/bundle).
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
- For help running workflows on the Google Cloud Platform or locally please
view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://software.broadinstitute.org/gatk/documentation/article?id=12521).
- The following material is provided by the GATK Team. Please post any questions or concerns to one of our forum sites : [GATK](https://gatkforums.broadinstitute.org/gatk/categories/ask-the-team/) , [FireCloud](https://gatkforums.broadinstitute.org/firecloud/categories/ask-the-firecloud-team) or [Terra](https://broadinstitute.zendesk.com/hc/en-us/community/topics/360000500432-General-Discussion) , [WDL/Cromwell](https://gatkforums.broadinstitute.org/wdl/categories/ask-the-wdl-team).
- Please visit the [User Guide](https://software.broadinstitute.org/gatk/documentation/) site for further documentation on our workflows and tools.

Cromwell version support
- Successfully tested on v32
- Does not work on versions < v23 due to output syntax

Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
### LICENSING :
Copyright Broad Institute, 2019 | BSD-3
This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.
- [GATK](https://software.broadinstitute.org/gatk/download/licensing.php)
- [BWA](http://bio-bwa.sourceforge.net/bwa.shtml#13)
- [Picard](https://broadinstitute.github.io/picard/)
- [Samtools](http://www.htslib.org/terms/)
17 changes: 8 additions & 9 deletions processing-for-variant-discovery-gatk4.b37.wgs.inputs.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,18 @@
],

"##_COMMENT4": "MISC PROGRAM PARAMETERS",
"PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
"#PreProcessingForVariantDiscovery_GATK4.bwa_commandline_override": "String? (optional)",
"PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "16",

"##_COMMENT5": "DOCKERS",
"PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
"PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.4.0",
"PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",
"#PreProcessingForVariantDiscovery_GATK4.gotc_docker_override": "String? (optional)",
"#PreProcessingForVariantDiscovery_GATK4.gatk_docker_override": "String? (optional)",
"#PreProcessingForVariantDiscovery_GATK4.python_docker_override": "String? (optional)",

"##_COMMENT6": "PATHS",
"PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
"PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",
"##_COMMENT6": "PATHS",
"#PreProcessingForVariantDiscovery_GATK4.gotc_path_override": "String? (optional)",
"#PreProcessingForVariantDiscovery_GATK4.gatk_path_override": "String? (optional)",

"##_COMMENT7": "JAVA OPTIONS",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
Expand Down Expand Up @@ -70,6 +70,5 @@
"PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

"##_COMMENT10": "PREEMPTIBLES",
"PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
"PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
"#PreProcessingForVariantDiscovery_GATK4.preemptible_tries_override": "Int? (optional)"
}
15 changes: 7 additions & 8 deletions processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,18 @@
],

"##_COMMENT4": "MISC PARAMETERS",
"PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
"#PreProcessingForVariantDiscovery_GATK4.bwa_commandline_override": "String? (optional)",
"PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "16",

"##_COMMENT5": "DOCKERS",
"PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
"PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.4.0",
"PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",
"#PreProcessingForVariantDiscovery_GATK4.gotc_docker_override": "String? (optional)",
"#PreProcessingForVariantDiscovery_GATK4.gatk_docker_override": "String? (optional)",
"#PreProcessingForVariantDiscovery_GATK4.python_docker_override": "String? (optional)",

"##_COMMENT6": "PATHS",
"PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
"PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",
"#PreProcessingForVariantDiscovery_GATK4.gotc_path_override": "String? (optional)",
"#PreProcessingForVariantDiscovery_GATK4.gatk_path_override": "String? (optional)",

"##_COMMENT7": "JAVA OPTIONS",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
Expand Down Expand Up @@ -73,6 +73,5 @@
"PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

"##_COMMENT10": "PREEMPTIBLES",
"PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
"PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
"#PreProcessingForVariantDiscovery_GATK4.preemptible_tries_override": "Int? (optional)"
}
55 changes: 31 additions & 24 deletions processing-for-variant-discovery-gatk4.wdl
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
## Copyright Broad Institute, 2018
## Copyright Broad Institute, 2019
##
## This WDL pipeline implements data pre-processing according to the GATK Best Practices
## (June 2016).
## This WDL pipeline implements data pre-processing according to the GATK Best Practices.
##
## Requirements/expectations :
## - Pair-end sequencing data in unmapped BAM (uBAM) format
Expand All @@ -15,14 +14,15 @@
## Output :
## - A clean BAM file and its index, suitable for variant discovery analyses.
##
## Software version requirements (see recommended dockers in inputs JSON)
## Software version requirements
## - GATK 4 or later
## - Picard (see gotc docker)
## - Samtools (see gotc docker)
## - BWA 0.7.15-r1140
## - Picard 2.16.0-SNAPSHOT
## - Samtools 1.3.1 (using htslib 1.3.1)
## - Python 2.7
##
## Cromwell version support
## - Successfully tested on v32
## - Successfully tested on v37
## - Does not work on versions < v23 due to output syntax
##
## Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
Expand All @@ -47,29 +47,36 @@ workflow PreProcessingForVariantDiscovery_GATK4 {
File ref_fasta_index
File ref_dict

String bwa_commandline
Int compression_level

File dbSNP_vcf
File dbSNP_vcf_index
Array[File] known_indels_sites_VCFs
Array[File] known_indels_sites_indices

String gotc_docker
String gatk_docker
String python_docker

String gotc_path
String gatk_path
String? bwa_commandline_override
String bwa_commandline = select_first([bwa_commandline_override, "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta"])
Int compression_level

String? gatk_docker_override
String gatk_docker = select_first([gatk_docker_override, "broadinstitute/gatk:4.1.0.0"])
String? gatk_path_override
String gatk_path = select_first([gatk_path_override, "/gatk/gatk"])

String? gotc_docker_override
String gotc_docker = select_first([gotc_docker_override, "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786"])
String? gotc_path_override
String gotc_path = select_first([gotc_path_override, "/usr/gitc/"])

String? python_docker_override
String python_docker = select_first([python_docker_override, "python:2.7"])

Int flowcell_small_disk
Int flowcell_medium_disk
Int agg_small_disk
Int agg_medium_disk
Int agg_large_disk

Int preemptible_tries
Int agg_preemptible_tries
String? preemptible_tries_override
Int preemptible_tries = select_first([preemptible_tries_override, "3"])

String base_file_name = sample_name + "." + ref_name

Expand Down Expand Up @@ -138,7 +145,7 @@ workflow PreProcessingForVariantDiscovery_GATK4 {
gatk_path = gatk_path,
disk_size = agg_large_disk,
compression_level = compression_level,
preemptible_tries = agg_preemptible_tries
preemptible_tries = preemptible_tries
}

# Sort aggregated+deduped BAM file and fix tags
Expand Down Expand Up @@ -183,7 +190,7 @@ workflow PreProcessingForVariantDiscovery_GATK4 {
docker_image = gatk_docker,
gatk_path = gatk_path,
disk_size = agg_small_disk,
preemptible_tries = agg_preemptible_tries
preemptible_tries = preemptible_tries
}
}

Expand Down Expand Up @@ -214,7 +221,7 @@ workflow PreProcessingForVariantDiscovery_GATK4 {
docker_image = gatk_docker,
gatk_path = gatk_path,
disk_size = agg_small_disk,
preemptible_tries = agg_preemptible_tries
preemptible_tries = preemptible_tries
}
}

Expand All @@ -226,7 +233,7 @@ workflow PreProcessingForVariantDiscovery_GATK4 {
docker_image = gatk_docker,
gatk_path = gatk_path,
disk_size = agg_large_disk,
preemptible_tries = agg_preemptible_tries,
preemptible_tries = preemptible_tries,
compression_level = compression_level
}

Expand Down Expand Up @@ -603,7 +610,7 @@ task GatherBqsrReports {
GatherBQSRReports \
-I ${sep=' -I ' input_bqsr_reports} \
-O ${output_report_filename}
}
}
runtime {
preemptible: preemptible_tries
docker: docker_image
Expand Down Expand Up @@ -679,7 +686,7 @@ task GatherBamFiles {
--OUTPUT ${output_bam_basename}.bam \
--CREATE_INDEX true \
--CREATE_MD5_FILE true
}
}
runtime {
preemptible: preemptible_tries
docker: docker_image
Expand Down

0 comments on commit 1b01988

Please sign in to comment.