-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong Supplementary Alignments #282
Comments
I am not sure what is wrong with this example. |
The supplementary (not secondary) alignment fully overlaps with the primary alignment. The SAM specifications definition states The issue with these alignments is that they break downstream programs that expect each alignment in a split read alignment to have at least one one base that is not aligned to another location in the chimeric alignment. |
The SAM spec defines chimeric alignment and says a chimeric alignment can be represented by a supplementary alignment. It doesn't say supplementary alignments shall not have large overlaps. Nothing is wrong here. |
Section 1.2 of the SAM specifications states the following: Chimeric alignment An alignment of a read that cannot be represented as a linear alignment. A chimeric alignment is represented as a set of linear alignments that do not have large overlaps. Typically, one of the linear alignments in a chimeric alignment is considered the “representative” alignment, and the others are called “supplementary” and are distinguished by the supplementary alignment flag. All the SAM records in a chimeric alignment have the same QNAME and the same values for 0x40 and 0x80 flags (see Section 1.4). The decision regarding which linear alignment is representative is arbitrary. |
These alignments typically involve a primary alignment with high NM. Does bwa produce such alignments as it doesn't find a primary contig seed in part of the read (due to the high error rate), so looks for split alignments (in ALT contigs?) then when it does S/W it overaligns w.r.t. the seeding? If you're not planning to actually change bwa, could you please update the documentation so downstream tools have an idea of the circumstances in which bwa will write alignments that violate the specifications? |
Read again. I don't see bwa is violating sam spec. I have been careful when writing the initial version of these sentences. |
Ok, I see why you said the output is wrong. Bwa outputs supplementary alignments. It doesn't say they are chimeric alignments. A chimeric alignment is represented by supplementary alignments, but supplementary alignments may have other meanings. I intentionally called 0x800 as a "supplementary" flag, not a "chimeric" flag. |
So you're saying the alignment isn't a chimeric alignment? Just to be clear, is your argument that a) The alignment is a chimeric alignment and you consider it a valid alignment. b) The alignment is not a chimeric alignment, but a different kind of supplementary alignment (an ALT alignment? The problem with a) is that it violates the no large overlap requirement of S1.2 The problems with b) are:
So I'm failing to see a spec-complaint interpretation of these records. They look very much like a split read alignment. For a hg38 with decoy reference, I'm getting around 1 in 500 reads exhibiting this behavour which has forced me to throttle the log messages in my my SV caller to prevent multiple gigabytes of warning messages about split reads with multiple alignments starting at the same read base offset. If it was a 1 in a million edge case I wouldn't be so concerned but it's a non-trival subset of reads when aligning against hg38. If these are essentially ALT contig secondary alignments, then I can adjust my code to handle these. What's the intended interpretation of such records? Are they a bug, or is it intentional ALT contig behavour? |
No, it is not. And a supplementary alignment is not intended to be representing chimeric alignments only. The bwa behavior is intentional and has been documented in README-alt.md since 2014. |
It looks to me that the wording of the specifications means that the only supplementary alignment you can represent in a spec-compliant manner is a chimeric alignment.
Thanks for pointer to the existing documentation. If you chould just add a few points of clarification on how a downstream tool is to determine what type of supplementary record bwa is reporting it would be much appreciated:
|
@lh3 any documentation on how a downstream tool can definitively identify whether an alignment should be considered a chimeric alignment, or a secondary alignment to an ALT contig would be much appreciated. |
I traced back errors from a structural variant software to bwa mem. My colleague, who is a bit of an expert with SAM format explained to me
An example of my data set which looks wrong is
Both the primary alignment and supplementary one are 150M for a 150 base read.
The text was updated successfully, but these errors were encountered: