Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New qual doesn't count spanning deletions as support for variant qual #4801

Merged
merged 2 commits into from
May 30, 2018

Conversation

davidbenjamin
Copy link
Contributor

@ldgauthier The new qual was not doing the right thing for spanning deletions. This fixes it. Would you mind reviewing? And would you like me to re-run those GVCFs just in case before going ahead with #4614?

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm that both those tests fail in master? I pasted them into my branch and testPresenceOfUnlikelySpanningDeletionDoesntAffectResults passes, but it's possible some changes I made in my branch are affecting the results.

@davidbenjamin
Copy link
Contributor Author

testSpanningDeletionIsNotConsideredVariant fails in master. testPresenceOfUnlikelySpanningDeletionDoesntAffectResults passes here and in master, as it should. Its purpose is to check that the spanning deletion logic doesn't break anything.

@ldgauthier
Copy link
Contributor

If the second test passed in master then I don't consider it to be a good test of your fix. Can you make a test where the PLs with and without span del would lead to different QUALs in the old version? Maybe a case with a confident deletion in sample1 and a low quality SNP in sample2 that causes a spanning deletion in sample1. You could probably do this by modifying your first test by adding a sample that has A or B called with low GQ with respect to the reference.

@davidbenjamin
Copy link
Contributor Author

The point of the second test is to make sure that the new code path involving the spanning deletion has no effect if the spanning deletion has low likelihood. You could imagine that I introduced a bug where you lose sensitivity whenever a spanning deletion appears in the alleles list even if there's very little evidence for it, just because its potential existence sends the code down some new, incorrect, logic.

I think testSpanningDeletionIsNotConsideredVariant is already doing what you want because before this fix the example VC is definitely considered to be variant even though the only non-ref evidence is for the symbolic * allele. The PLs in this example are 50, 100, 100, 0, 100, 100, which means that it is overwhelmingly likely that this sample's genotype is REF/SPAN_DEL. In the old code you get a qual of about 50 (the hom ref PL), whereas in the new code you get that the chances of a variant are about 1 in 10^10 since the only truly variant PLs are 100.

Now, if you think a multi-sample case would be useful that sounds good to me. However, a "confident deletion in sample1 and a low quality SNP in sample2" would not look very different since the deletion would only exist in an upstream VC, not the VC being tested, so basically we would have sample1 = REF/SPAN_DEL, sample2 = REF/SNP with low quality. The current test is the same thing with just sample1. Is this what you had in mind?

@ldgauthier
Copy link
Contributor

Let's be rigorous and use two samples so that you're testing a case that could actually occur in the wild. It should be easy enough to add sample2 with PLs like [10,0,40,100,70,300]. Then can you add a comment about where the "magic number" in the assert statement comes from? Or maybe to be super duper rigorous you could compare the QUAL from that variant to the QUAL for sample3 and sample4 where 3 and 4 have the same PLs as 1 and 2, but without the spanning deletion.

@davidbenjamin
Copy link
Contributor Author

@ldgauthier I did all of those things. Note that to be precise, when you remove the spanning deletion the "het" REF / SPAN_DEL must be treated as haploid, which the new tests do.

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful new tests. I completely agree with the logic about the span-del het being like a haploid ref.

@codecov-io
Copy link

Codecov Report

Merging #4801 into master will increase coverage by 0.348%.
The diff coverage is 90.476%.

@@               Coverage Diff               @@
##              master     #4801       +/-   ##
===============================================
+ Coverage     80.131%   80.479%   +0.348%     
- Complexity     17488     18032      +544     
===============================================
  Files           1085      1089        +4     
  Lines          63245     65070     +1825     
  Branches       10200     10630      +430     
===============================================
+ Hits           50679     52368     +1689     
- Misses          8579      8632       +53     
- Partials        3987      4070       +83
Impacted Files Coverage Δ Complexity Δ
...org/broadinstitute/hellbender/utils/MathUtils.java 77.162% <100%> (+0.308%) 194 <2> (+2) ⬆️
...rs/genotyper/afcalc/AlleleFrequencyCalculator.java 85.227% <86.667%> (-0.299%) 28 <4> (+5)
...forms/markduplicates/MarkDuplicatesSparkUtils.java 87.64% <0%> (-1.857%) 69% <0%> (+7%)
...s/spark/sv/discovery/SvDiscoveryInputMetaData.java 100% <0%> (ø) 9% <0%> (+7%) ⬆️
.../utils/read/markduplicates/DuplicationMetrics.java 85.366% <0%> (ø) 13% <0%> (?)
...covery/inference/CpxVariantReInterpreterSpark.java 100% <0%> (ø) 5% <0%> (?)
...ils/test/testers/AbstractMarkDuplicatesTester.java 79.487% <0%> (ø) 17% <0%> (?)
...nce/SegmentedCpxVariantSimpleVariantExtractor.java 93.96% <0%> (ø) 71% <0%> (?)
.../sv/StructuralVariationDiscoveryPipelineSpark.java 88.652% <0%> (+0.081%) 13% <0%> (ø) ⬇️
...iscoverFromLocalAssemblyContigAlignmentsSpark.java 77.019% <0%> (+0.177%) 3% <0%> (+1%) ⬆️
... and 30 more

@ldgauthier
Copy link
Contributor

@davidbenjamin I kicked Travis and now tests are green so you can merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants