Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem of SamToFastq when handling unpaired reads #715

Closed
yjx1217 opened this issue Dec 15, 2016 · 14 comments
Closed

problem of SamToFastq when handling unpaired reads #715

yjx1217 opened this issue Dec 15, 2016 · 14 comments

Comments

@yjx1217
Copy link

yjx1217 commented Dec 15, 2016

Bug Report

[Thu Dec 15 15:04:32 CET 2016] picard.sam.SamToFastq done. Elapsed time: 48.16 minutes.
Runtime.totalMemory()=12157714432
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 327 unpaired mates
at htsjdk.samtools.SAMUtils.processValidationError(SAMUtils.java:460)
at picard.sam.SamToFastq.doWork(SamToFastq.java:217)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Affected tool(s)

SamToFastq

Affected version(s)

picard-2.8.0 and picard-2.7.1, I didn't test other versions

Description

When using SamToFastq to extract reads from bam file, I used the following command:
java -jar picard.jar SamToFastq INPUT=input.bam FASTQ=output.R1.fq.gz SECOND_END_FASTQ=output.R2.fq.gz UNPAIRED_FASTQ=output.unpaired.fq.gz

Ideally, those unpaired reads should be written into the "output.unpaired.fq.gz" file. However, this file is empty and picard raised the error complaining about unpaired reads.

@magicDGS
Copy link
Contributor

Have you try with VALIDATION_STRINGENCY=SILENT?

@yjx1217
Copy link
Author

yjx1217 commented Dec 15, 2016

Hi magicDGS,

Thanks for quick reply. I tried the VALIDATION_STRINGENCY=SILENT option just now. This time no error message was reported but the "output.unpaired.fq.gz" file remains empty. Also, I tried multiple input bam files, so this should not be a problem of my input files. My reads mapping was done by bwa mem by the way.

@yfarjoun
Copy link
Contributor

Are the reads actually (flagged as) unpaired, or are their mates missing from the file? Given that you got an error, I think that your problem is that reads that claim to be paired are missing their mate, rather than actually being unpaired which means that they were not sequenced as a pair.

How did you get this bam file? Did you downsample a valid bam to get it?

cheers,

Yossi.

@yjx1217
Copy link
Author

yjx1217 commented Dec 15, 2016 via email

@yjx1217
Copy link
Author

yjx1217 commented Dec 22, 2016

Hello,

Any further updates about this issue? Thanks!

Best,
Jia-Xing

@vdauwera
Copy link
Contributor

Please post usage questions in the GATK forum as instructed in the "getting started" doc. Github Issues are reserved for discussions about the code and confirmed bug reports. Thank you.

@yjx1217
Copy link
Author

yjx1217 commented Dec 22, 2016

Hi vdauwera,

I don't think this is a usage question. I think this is a bug. Please double check. Thanks!

Best,
Jia-Xing

@vdauwera
Copy link
Contributor

It's very unlikely to be a bug, much more likely that there is something wrong with the data (along the lines of what Yossi suggested) because this functionality is covered by tests and is used by many other people. So we don't consider anything a bug until we have solid evidence. To get that evidence, we typically need a back and forth conversation between you and the support team (which I lead). That conversation belongs on the user forum. When it takes place on Github instead, it clutters the tool developers' github feed and wastes their time and attention that would be better spent on actual development discussions and code improvements. That is why we ask you to post in the forum. Please help us help you by following our process.

@rjorton
Copy link

rjorton commented Feb 17, 2017

I get the same error, but suspect it is my data.

I suspect it is something to do with the BAM file (which was given to me) being previously filtered for Mapped reads only. It is a paired end read data set, but in a few cases only one of the pair mapped, then the unmapped pair has been removed from the BAM. When trying to recreate the FASTQ reads from the BAM, these unpaired reads don't go into FASTQ or SECOND_END_FASTQ, and if I provide UNPAIRED (which I hoped the singletons would go to) I get the error:

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 52 unpaired mates
at htsjdk.samtools.SAMUtils.processValidationError(SAMUtils.java:460)
at picard.sam.SamToFastq.doWork(SamToFastq.java:217)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

So think it is a data problem rather than a bug, the unpaired reads in the BAM are actually paired but the pair has been removed from the BAM (as the pair didnt map).

@wangyugui
Copy link

But there is a use case to extract fastq reads only mapped in some region only.
so we need to filter the bam, and then SamToFastq it.

@sooheelee
Copy link
Contributor

Are the secondary alignments that BWA's -M option generates removed from the file? These are duplicate of a primary alignment record. Also, are you included the unmapped mates of mapped reads? Take a look at https://software.broadinstitute.org/gatk/guide/article?id=6976.

@wangyugui
Copy link

In my case, the secondary alignments are removed by 'samtools view -F 0x900'.

@sooheelee
Copy link
Contributor

sooheelee commented Mar 16, 2017

@wangyugui, you are removing not primary and supplementary alignments with -F 0x900. If you are specifying the following (as yjx1217 does above):

bwa mem -t 4 -M

then the -M option asks the tool to mark as secondary all supplementary alignments. So you will have no supplementary alignments. These are instead flagged with the 0x100 flag. So your file will still contain multiple records for the same read.

@wangyugui
Copy link

@sooheelee samtool view -F 0x900 will remove 0x100 flag too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants