Performance of methylation pipeline #3304

naumenko-sa · 2020-07-15T13:01:37Z

Users reported slowdowns of methylation pipeline on trim_galore and bismark steps.

trim_galore uses a non-straightforward threading scheme:

solution: with bcbio_nextgen.py -n 4 trim_galore runs cutadapt with 1 thread and it is very slow. Increasing bcbio treads speeds up trim_galore.

We tested bismark step with two threading parameters:
https://bcbio-nextgen.readthedocs.io/en/latest/contents/methylation.html#benchmarking
The currently recommended combination is 16/2/100G.

#3303
#3301

S

The text was updated successfully, but these errors were encountered:

naumenko-sa · 2020-07-15T13:41:24Z

Some samples failed extractor step:

bismark_methylation_extractor \
--no_overlap \
--comprehensive \
--cytosine_report \
--genome_folder /genomes/Hsapiens/hg38/bismark/ \
--merge_non_CpG \
--multicore 1 \
--buffer_size 5G \
--bedGraph \
--gzip 
/path/work/dedup/sample/sample.nsorted.deduplicated.bam

[FATAL ERROR:] The IDs of Read 1 and Read 2 are not the same.
This might be the result of sorting the BAM files by chromosomal position or merging several files with Samtools sort, and this is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file by name using the command 'samtools sort -n'. Paired-end files may be merged properly (without risking this error) using either 'samtools merge -n' or 'samtools cat'.

naumenko-sa · 2020-07-15T17:26:46Z

it seems to happen in some samples because we do

1. bismark alignment
2. sorting
3. deduplication
4. extraction

step 4 fails with sorted reads, but for 3 we have to sort. the solution is to skip sorting and deduplication or do step4 in a single-end mode (-s).

naumenko-sa · 2020-07-15T18:02:10Z

FelixKrueger/Bismark#360

naumenko-sa · 2020-07-21T14:34:32Z

Lambda phage discussion: FelixKrueger/Bismark#361

naumenko-sa · 2020-07-27T13:40:27Z

Finished the test cohort + vs Lambda genome.
4/2/100G (-n 16) passes without errors.
For the fastest processing use 16/2/192G (-n 32)

naumenko-sa self-assigned this Jul 15, 2020

naumenko-sa closed this as completed Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of methylation pipeline #3304

Performance of methylation pipeline #3304

naumenko-sa commented Jul 15, 2020

naumenko-sa commented Jul 15, 2020 •

edited

Loading

naumenko-sa commented Jul 15, 2020

naumenko-sa commented Jul 15, 2020

naumenko-sa commented Jul 21, 2020

naumenko-sa commented Jul 27, 2020

Performance of methylation pipeline #3304

Performance of methylation pipeline #3304

Comments

naumenko-sa commented Jul 15, 2020

naumenko-sa commented Jul 15, 2020 • edited Loading

naumenko-sa commented Jul 15, 2020

naumenko-sa commented Jul 15, 2020

naumenko-sa commented Jul 21, 2020

naumenko-sa commented Jul 27, 2020

naumenko-sa commented Jul 15, 2020 •

edited

Loading