-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem of SamToFastq when handling unpaired reads #715
Comments
Have you try with |
Hi magicDGS, Thanks for quick reply. I tried the VALIDATION_STRINGENCY=SILENT option just now. This time no error message was reported but the "output.unpaired.fq.gz" file remains empty. Also, I tried multiple input bam files, so this should not be a problem of my input files. My reads mapping was done by bwa mem by the way. |
Are the reads actually (flagged as) unpaired, or are their mates missing from the file? Given that you got an error, I think that your problem is that reads that claim to be paired are missing their mate, rather than actually being unpaired which means that they were not sequenced as a pair. How did you get this bam file? Did you downsample a valid bam to get it? cheers, Yossi. |
Hi Yossi,
The bam file was generated by mapping all the reads to the reference genome
without any manipulation. I only specified "bwa mem -t 4 -M" for my
mapping. Also, those paired reads were successfully generated by SamToFastq
as shown in the "output.R1.fq.gz" and "output.R2.fq.gz" file. So the
problem is really about those unpaired reads.
Best,
Jia-Xing
|
Hello, Any further updates about this issue? Thanks! Best, |
Please post usage questions in the GATK forum as instructed in the "getting started" doc. Github Issues are reserved for discussions about the code and confirmed bug reports. Thank you. |
Hi vdauwera, I don't think this is a usage question. I think this is a bug. Please double check. Thanks! Best, |
It's very unlikely to be a bug, much more likely that there is something wrong with the data (along the lines of what Yossi suggested) because this functionality is covered by tests and is used by many other people. So we don't consider anything a bug until we have solid evidence. To get that evidence, we typically need a back and forth conversation between you and the support team (which I lead). That conversation belongs on the user forum. When it takes place on Github instead, it clutters the tool developers' github feed and wastes their time and attention that would be better spent on actual development discussions and code improvements. That is why we ask you to post in the forum. Please help us help you by following our process. |
I get the same error, but suspect it is my data. I suspect it is something to do with the BAM file (which was given to me) being previously filtered for Mapped reads only. It is a paired end read data set, but in a few cases only one of the pair mapped, then the unmapped pair has been removed from the BAM. When trying to recreate the FASTQ reads from the BAM, these unpaired reads don't go into FASTQ or SECOND_END_FASTQ, and if I provide UNPAIRED (which I hoped the singletons would go to) I get the error: Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 52 unpaired mates So think it is a data problem rather than a bug, the unpaired reads in the BAM are actually paired but the pair has been removed from the BAM (as the pair didnt map). |
But there is a use case to extract fastq reads only mapped in some region only. |
Are the secondary alignments that BWA's |
In my case, the secondary alignments are removed by 'samtools view -F 0x900'. |
@wangyugui, you are removing
then the |
@sooheelee samtool view -F 0x900 will remove 0x100 flag too. |
Bug Report
[Thu Dec 15 15:04:32 CET 2016] picard.sam.SamToFastq done. Elapsed time: 48.16 minutes.
Runtime.totalMemory()=12157714432
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 327 unpaired mates
at htsjdk.samtools.SAMUtils.processValidationError(SAMUtils.java:460)
at picard.sam.SamToFastq.doWork(SamToFastq.java:217)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Affected tool(s)
SamToFastq
Affected version(s)
picard-2.8.0 and picard-2.7.1, I didn't test other versions
Description
When using SamToFastq to extract reads from bam file, I used the following command:
java -jar picard.jar SamToFastq INPUT=input.bam FASTQ=output.R1.fq.gz SECOND_END_FASTQ=output.R2.fq.gz UNPAIRED_FASTQ=output.unpaired.fq.gz
Ideally, those unpaired reads should be written into the "output.unpaired.fq.gz" file. However, this file is empty and picard raised the error complaining about unpaired reads.
The text was updated successfully, but these errors were encountered: