Skip to content

Commit

Permalink
documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
RuthEberhardt committed May 15, 2024
1 parent 3cdc97a commit 3ca754c
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 33 deletions.
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,21 @@
# glimpse-nextflow
# glimpse-nextflow

Nextflow pipeline to run GLIMPSE2 on a multisample VCF.

This pipeline first lists the samples in the inout multisample VCF, splits this into batches, splits the multisample VCF into VCFs for each batch. GLIMPSE2 phase is then run on each batch for each file in the reference directory. The output of GLIMPSE2 phase is ligated using GLIMPSE2 ligate to produce a phased VCF per batch. These per-batch phased VCFs are them merged using bcftools merge resulting in a VCF containing phased variants for all samples. Finally bcftools +impute-info is used to recalculate INFO scores.

To run, modify the following parameters in nextflow.config

batch_size = 10
vcf_in = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/test_vcfs/test_for_glimpse.vcf.gz"
refdir = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/ref_mini/"
workdir = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/work"
fasta = "/lustre/scratch125/humgen/resources/ref/Homo_sapiens/GRCh38_15/Homo_sapiens.GRCh38_15.fa"
fai = "/lustre/scratch125/humgen/resources/ref/Homo_sapiens/GRCh38_15/Homo_sapiens.GRCh38_15.fa.fai"
ref_bed = "/lustre/scratch125/humgen/resources/ref/Homo_sapiens/GRCh38_15/Homo_sapiens.GRCh38_15.bed"
publishdir = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/output/"

batch_size gives a target batch size, 100-200 is recommended. The multisample VCF is split into batches of approximately the target batch size, which is adjusted so that the final batch is not too small.

Submit to the farm as follows:
bsub -J nextflow -R "select[mem>4000] rusage[mem=4000]" -M 4000 -o out -e err -qlong "nextflow run /path/to/glimpse-nextflow/main.nf -with-trace -profile sanger"
32 changes: 0 additions & 32 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ params {
// number of samples to be processed in each batch. Recommended 100-200
batch_size = 10
vcf_in = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/test_vcfs/test_for_glimpse.vcf.gz"
// refdir = "/lustre/scratch125/humgen/resources/GLIMPSE/1000g_chunked_for_GLIMPSE/defaults_snp_biallelic/split/"
refdir = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/ref_mini/"
workdir = "/lustre/scratch126/humgen/teams/hgi/users/re3/blended_genomes_exomes/glimpse_pipe_test/work"
fasta = "/lustre/scratch125/humgen/resources/ref/Homo_sapiens/GRCh38_15/Homo_sapiens.GRCh38_15.fa"
Expand Down Expand Up @@ -57,37 +56,6 @@ def check_max(obj, type) {
}
}


//process{
// executor = 'lsf'
// queue = { task.time < 20.m ? 'small' : task.time < 12.h ? 'normal' : task.time < 48.h ? 'long' : task.time < 168.h ? 'week' : 'basement' }
// queue = 'basement'

// withLabel:process_small{
// cpus = 6
// queue = 'small'
// time = {0.1h * task.attempt}
// errorStrategy = 'retry'
// memory = 50.MB
// }

// withLabel:process_medium{
// cpus = 6
// queue = 'normal'
// time = {2h * task.attempt}
// errorStrategy = 'retry'
// memory = 200.MB
// }
//}

//executor{
// name = 'lsf'
// perJobMemLimit = true
// poolSize = 4
// submitRateLimit = '5 sec'
// killBatchSize = 50
//}

singularity {
enabled = true
cacheDir = '/nfs/hgi/singularityContainers/'
Expand Down

0 comments on commit 3ca754c

Please sign in to comment.