You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At some positions (such as 20:10001430 in src/test/resources/org/broadinstitute/hellbender/tools/haplotypecaller/expected.testGVCFMode.gatk4.g.vcf) reads with mismatches or indels in the BWA-aligned pileup get realigned during local assembly (where they will get evaluated as variants in the active region). However, the flanking regions adjacent to the active region are evaluated using the original alignments. This can lead to het calls outside the active region that get capped to GQ0 homRef calls.
The solution is not entirely trivial since we hard clip reads after we trim the active region and compute the likelihoods used for their realignments based on this clipped sequence. One relatively easy hack that would be a partial improvement would be to omit reads that had been realigned from the pileup in the flanking regions.
The text was updated successfully, but these errors were encountered:
With bwa mem and 150 bp reads, there appear to be lots of soft-clipped reads triggering this situation. As a result, there are many '0/0' and GQ=0 GT calls being generated in a joint-genotyped VCF. Could the priority of issue be re-considered given that it is not such a rare case?
@jjfarrell in the case of soft-clips the solution is even more complicated. During assembly the reads get un-soft-clipped. If the soft-clipped sequence can't merge back to the reference (as in a scenario when the read is a split read at the edge of a big structural variant that we don't expect HaplotypeCaller to call), then the sequence that was originally soft-clipped doesn't get incorporated into the discovered haplotypes. When the likelihoods of the events in the discovered haplotypes are calculated, reads that have low likelihoods against all the haplotypes are dropped. So without counting the soft-clips and NON_REF elements in the pileup genotyping, that evidence isn't taken into account at all. In that scenario, as an analyst, I would want the GQ0 call because if there are enough reads for the site to get genotyped as a het, then it shows that something was going on there, even if we don't know what.
How many reads are "lots"? We've seen cases where the DNA extraction wasn't done properly and there's a high rate of chimeric reads that throw a lot of false positive insertions. If the soft-clipped reads really are supporting a deletion that HaplotypeCaller calls properly, but BWA couldn't align, then they're likely from deletions between 20 and maybe 100bp or so, which falls into that gray area of large indel or small SV. (I've seen cases where we've assembled an insertion of ~200bp, but I don't expect our accuracy in that size range is particularly high.) It's likely that HaplotypeCaller won't call the event correctly because this is at the border of it's size range, in which case, again, I would want those GQ0 calls to show that something is amiss. There's a possibility that the soft-clips were correctly reassembled, but in my experience that's almost never the case.
At some positions (such as 20:10001430 in src/test/resources/org/broadinstitute/hellbender/tools/haplotypecaller/expected.testGVCFMode.gatk4.g.vcf) reads with mismatches or indels in the BWA-aligned pileup get realigned during local assembly (where they will get evaluated as variants in the active region). However, the flanking regions adjacent to the active region are evaluated using the original alignments. This can lead to het calls outside the active region that get capped to GQ0 homRef calls.
The solution is not entirely trivial since we hard clip reads after we trim the active region and compute the likelihoods used for their realignments based on this clipped sequence. One relatively easy hack that would be a partial improvement would be to omit reads that had been realigned from the pileup in the flanking regions.
The text was updated successfully, but these errors were encountered: