This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
The pipeline is built using Nextflow and processes data using the following steps:
- Group reads - Group reads in specific regions according to the genotypes at the selected markers
- Call variants - Joint call variants in the meta-samples for each vairant of interest
- Predict variant effects - Predict variant effects using ENSEMBL VEP
Group sample reads around a region of interest according to user-defined grouping criteria and the genotypes at a selected marker.
Output files
reads/group_<GROUP_ID>_variant_<VARIANT_ID>_gt_<GT>/
*.cram
: sequencing reads in cram format*.crai
: cram file index
Use GATK4 joint germline variant calling to detect variants in the grouped sequencing reads by group and genotype. The grouping by genotype allows to detect variant in linkage disequilibrium with the marker of interest.
Output files
variants/variant_<VARIANT_ID>/
*.vcf.gz
: variant calls in all the meta-samples in vcf format*.tbi
: vcf file index
Use ENSEMBL VEP on the variant calls obtained in the previous step to determine variant consequence.
Output files
variants/variant_<VARIANT_ID>/
*.vep.tsv.gz
: variant consequence predictions*.mut.gz
: variant consequences formatted in such a way that they can be directly loaded in IGV*.vep.summary.html
: html report from VEP
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.