Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of methylation pipeline #3304

Closed
naumenko-sa opened this issue Jul 15, 2020 · 5 comments
Closed

Performance of methylation pipeline #3304

naumenko-sa opened this issue Jul 15, 2020 · 5 comments
Assignees

Comments

@naumenko-sa
Copy link
Contributor

Users reported slowdowns of methylation pipeline on trim_galore and bismark steps.

  1. trim_galore uses a non-straightforward threading scheme:

solution: with bcbio_nextgen.py -n 4 trim_galore runs cutadapt with 1 thread and it is very slow. Increasing bcbio treads speeds up trim_galore.

  1. We tested bismark step with two threading parameters:
    https://bcbio-nextgen.readthedocs.io/en/latest/contents/methylation.html#benchmarking
    The currently recommended combination is 16/2/100G.

#3303
#3301

S

@naumenko-sa naumenko-sa self-assigned this Jul 15, 2020
@naumenko-sa
Copy link
Contributor Author

naumenko-sa commented Jul 15, 2020

Some samples failed extractor step:

bismark_methylation_extractor \
--no_overlap \
--comprehensive \
--cytosine_report \
--genome_folder /genomes/Hsapiens/hg38/bismark/ \
--merge_non_CpG \
--multicore 1 \
--buffer_size 5G \
--bedGraph \
--gzip 
/path/work/dedup/sample/sample.nsorted.deduplicated.bam

[FATAL ERROR:] The IDs of Read 1 and Read 2 are not the same.
This might be the result of sorting the BAM files by chromosomal position or merging several files with Samtools sort, and this is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file by name using the command 'samtools sort -n'. Paired-end files may be merged properly (without risking this error) using either 'samtools merge -n' or 'samtools cat'.

@naumenko-sa
Copy link
Contributor Author

it seems to happen in some samples because we do

1. bismark alignment
2. sorting
3. deduplication
4. extraction

step 4 fails with sorted reads, but for 3 we have to sort. the solution is to skip sorting and deduplication or do step4 in a single-end mode (-s).

@naumenko-sa
Copy link
Contributor Author

FelixKrueger/Bismark#360

@naumenko-sa
Copy link
Contributor Author

Lambda phage discussion: FelixKrueger/Bismark#361

@naumenko-sa
Copy link
Contributor Author

Finished the test cohort + vs Lambda genome.
4/2/100G (-n 16) passes without errors.
For the fastest processing use 16/2/192G (-n 32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant