-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bismark_methylation_extractor difference between pair end or single end #579
Comments
If you use the deduplciation command:
and then use that file as the input for the methylation extractor it should all work fine (in paired-end mode):
If you extract the methylation information from this file as single-end file, it will just extract all calls from each file, and not take into consideration the paired-end nature of your data - which means that it will not do an overlap detection between R1 and R2, and therefore you would see parts of the read that are present in both R1 and R2 extracted twice (which leads to an undue coverage bias). I hope this is a good enough explanation, and you won't use |
Yeah, that pretty much makes sense! But "parts of the read that are present in both R1 and R2_extracted twice" means extracted sites in
|
My gut feeling is that your last statement is correct, but since this the wrong thing to do on at least two levels I don't really want to invest more energy into this to be perfectly honest. Maybe you want to follow this up some more? |
After I checked the bam file, I found that |
Thanks for persisting. If I am not mistaken that single-end mode won't actually discard these calls, but they will get attributed to a different strand (in your case probably the OB strand), so the calls should be in a different file altogether. Can you just grep for that read ID in the other file? But yea, don't do it :) |
glad we found an explanation in the end! All the best going forward! |
Hi Felix!
I'm using bismark for wgbs data. I'm confused about the difference between pair end or single end data in
bismark_methylation_extractor
function. I can't find out the difference since they both just extract methylation data from bam file.In this issue #360 you said that Read1 and Read2 follow each other on consecutive lines, so this command will fail
However, when I added
--se
tobismark_methylation_extractor
command, it worked. Then I compared the output generated from the RIGHT pipeline (withoutsamtools sort
). It seemed that the RIGHT file kept a few more records.What's the difference between two modes and the output file?
A lot of thanks!
The text was updated successfully, but these errors were encountered: