-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WGBS sample: alignment - sorting - de-duplication - extraction = Fatal Error Read1 != Read2 #360
Comments
Hi Sergey, The entire Bismark pipeline does not require any sorting by chromosomal position, in fact But yes, for WGBS we would recommend definitely recommend de-duplication, and you should proceed in paired-end mode as per usual. Do let me know if something was unclear. |
Hi Felix! Thanks for such a prompt response! Sorry, I was not precise enough - after double-checking I see that we are not sorting the bam by chromosomal position,
For some samples, all 6 steps run just fine. I ran the extractor step for each bam file from steps 1-4: Then I ran the deduplication step for each bam file: Thus,
I'm just removing steps2-3 from the workflow and leaving this description here just in case somebody hits the same issue. Sergey |
Hi Felix! Removing bam processing steps helped to finish many samples.
I've extracted SAM records with READ_ID_1 and READ_ID_2 from the bismark output bam (sample_bismark_bt2_PE.bam):
So READ_ID_1 is unpaired but has Another issues - some samples fail deduplication step with:
How could we debug this issue? Thanks! |
Hi Sergey, This doesn't sound right. Without having seen it with my own eyes, I would say Bismark would never ever produce a paired-end output file where Read 1 and Read 2 do not follow each other, like here:
Thus I also don't really have a 'best solution' for this, as it should never come up in the first place.... I am afraid I also don't have an out of the box answer to the last issue without taking a look at the actual file in question. There seem to be some empty lines in the file, and some programs seems to be reporting a non-zero exit status (which do not come from A few considerations:
Let's we assume that a single thread of Bismark uses say 12GB of RAM (not exactly sure which human genome you are using but 9-12GB is the minimum for directional alignments), and 3-5 cores of CPU. With
I think my suggestion would probably be to upgrade Bismark to the latest version, and be a bit more humble with the command you are using, e.g.:
If you are still getting such weird looking BAM files we could arrange a file transfer and I could take a look myself...? Cheers, Felix |
Thanks, Felix! I've updated trim-galore and bismark to the latest versions in bcbio and re-running the failed samples with fewer threads. Bcbio users reported that our bismark wrapper is very slow, The results are here: https://bcbio-nextgen.readthedocs.io/en/latest/contents/methylation.html#benchmarking Of course, such a test is not a perfect measurement, it depends on the current cluster load/architecture. And, with that 16/2/100G combination, I was able to process >50% of the samples in 2-3 days. Buy the way, Dragen's page references Bismark: Are you aware of any publicly available benchmarks comparing Dragen's and Bismark's output? Is it a hardware-accelerated port of a specific version of Bismark code, or a pipeline developed along the lines established in Bismark? Sorry - those are not exactly the questions to address in the bismark GitHub, just wondering if the answer is known to the methylation research community - it will save us the effort to find the answer the hard way. Thanks! |
Excellent, hope it'll work this time. Regarding Dragen's output: I am afraid I am not exactly aware of what they have done, the last meeting we had with Illumina was many years ago... |
I see, thanks, Felix! Updating bismark and trim-galore and running with bismark_threads=4 and bowtie_threads=2 helped to finish all failed samples. Typically, we run on 48Cores/ 192G RAM instances, so even 8/4/192G makes sens. |
I had the same problem, and none of the recommendations worked. |
Hello @FelixKrueger !
Thanks for maintaining the Bismark pipeline!
We are running WGBS analysis using Bismark wrapped in bcbio
using the following steps:
de-duplication is not recommended for RRBS samples, but our samples are WGBS, so we need de-duplication, right?
For some samples step 4 fails with:
Step 4 requires an unsorted bam file, but step 3 requires a sorted one.
Could you please comment on how should we organize the workflow?
samtools sort -n
before step4?Thanks!
Sergey
The text was updated successfully, but these errors were encountered: