-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HOLD: Bring in more duplication expansion calls #3668
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3668 +/- ##
==============================================
+ Coverage 78.463% 79.444% +0.98%
- Complexity 16421 17539 +1118
==============================================
Files 1041 1137 +96
Lines 59099 63304 +4205
Branches 9673 9668 -5
==============================================
+ Hits 46371 50291 +3920
- Misses 8985 9157 +172
- Partials 3743 3856 +113
|
dcfd714
to
1cd756b
Compare
1cd756b
to
7d20771
Compare
…e. dup); refactored some related test code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #3475, conflicts are expected in the test classes, because they are now extending GATKBaseTest
- a rebase should solve most of the problems without need of change of the code (except comments below)
import static org.broadinstitute.hellbender.tools.spark.sv.utils.GATKSVVCFConstants.*; | ||
|
||
|
||
public class SimpleSvTypeInferenceAndConstructionUnitTest extends BaseTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #3475, this should extend GATKBaseTest
7d20771
to
90c133b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self added
@@ -131,8 +133,8 @@ SimpleInterval getInvertedTransInsertionRefSpan() { | |||
return invertedTransInsertionRefSpan; | |||
} | |||
|
|||
boolean hasDupSeqButNoStrandSwitch() { | |||
return hasDuplicationAnnotation && dupSeqStrandOnCtg.stream().noneMatch(s -> s.equals(Strand.NEGATIVE)); | |||
boolean indicatesInvDup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up 1.
private void resolveComplicationForSimpleTandupExpansion(final SimpleInterval leftReferenceInterval, | ||
final SimpleInterval rightReferenceInterval, | ||
final AlignmentInterval firstContigRegion, | ||
private void resolveComplicationForSimpleTandupExpansion(final AlignmentInterval firstContigRegion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up 3
@@ -303,6 +305,7 @@ private void initForInvDup(final ChimericAlignment chimericAlignment, final byte | |||
*/ | |||
@VisibleForTesting | |||
public static boolean isLikelyInvertedDuplication(final AlignmentInterval one, final AlignmentInterval two) { | |||
if (one.forwardStrand==two.forwardStrand) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up 2.
@@ -328,15 +331,15 @@ private void initForInsDel(final ChimericAlignment chimericAlignment, final byte | |||
|
|||
final boolean oneContainedInTheOther = leftReferenceSpan.contains(rightReferenceSpan) || rightReferenceSpan.contains(leftReferenceSpan); | |||
if (oneContainedInTheOther) { | |||
resolveComplicationForSimpleTandupExpansion(leftReferenceSpan, rightReferenceSpan, firstContigRegion, secondContigRegion, r1e, r2b, distBetweenAlignRegionsOnCtg, contigSeq, true); | |||
} else if ( distBetweenAlignRegionsOnRef > 0 ) { // Deletion: | |||
resolveComplicationForSimpleTandupExpansion(firstContigRegion, secondContigRegion, r1e, r2b, contigSeq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up 4.1
resolveComplicationForSimpleDel(firstContigRegion, secondContigRegion, distBetweenAlignRegionsOnCtg, contigSeq); | ||
} else if (distBetweenAlignRegionsOnRef == 0 && distBetweenAlignRegionsOnCtg > 0) { // Insertion: simple insertion, inserted sequence is the sequence [c1e+1, c2b-1] on the contig | ||
insertedSequenceForwardStrandRep = getInsertedSequence(firstContigRegion, secondContigRegion, contigSeq); | ||
} else if (distBetweenAlignRegionsOnRef == 0 && distBetweenAlignRegionsOnCtg < 0) { // Tandem repeat contraction: reference has two copies but one copy was deleted on the contig; duplicated sequence on reference are [r1e-|d2|+1, r1e] and [r2b, r2b+|d2|-1] | ||
resolveComplicationForSimpleTandupContraction(leftReferenceSpan, firstContigRegion, secondContigRegion, r1e, c1e, c2b, contigSeq); | ||
} else if (distBetweenAlignRegionsOnRef < 0 && distBetweenAlignRegionsOnCtg >= 0) { // Tandem repeat expansion: reference bases [r1e-|d1|+1, r1e] to contig bases [c1e-|d1|+1, c1e] and [c2b, c2b+|d1|-1] with optional inserted sequence [c1e+1, c2b-1] in between the two intervals on contig | ||
resolveComplicationForSimpleTandupExpansion(leftReferenceSpan, rightReferenceSpan, firstContigRegion, secondContigRegion, r1e, r2b, distBetweenAlignRegionsOnCtg, contigSeq, false); | ||
resolveComplicationForSimpleTandupExpansion(firstContigRegion, secondContigRegion, r1e, r2b, contigSeq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up 4.2
@@ -386,6 +425,85 @@ private void resolveComplicationForSimpleTandupExpansion(final SimpleInterval le | |||
} | |||
} | |||
|
|||
static byte[] extractAltHaplotypeForTandupExpansionWithContainment(final AlignmentInterval firstContigRegion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up 6
determineStrandSwitch(current, next), involvesRefPositionSwitch(current, next)); | ||
if (isNotSimpleTranslocation) | ||
results.add(new ChimericAlignment(current, next, insertionMappings, alignedContig.contigName)); | ||
final ChimericAlignment chimericAlignment = new ChimericAlignment(current, next, insertionMappings, alignedContig.contigName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method to be phased out, but it makes sense to let down stream users to check what type of novel adjacency this CA could point to, which requests the state checkers to be more comprehensive
public static boolean nextAlignmentMayBeInsertion(final AlignmentInterval current, final AlignmentInterval next, | ||
final Integer minAlignLength, final int mapQThresholdInclusive, | ||
final boolean filterWhollyContained) { | ||
static boolean nextAlignmentMayBeInsertion(final AlignmentInterval current, final AlignmentInterval next, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check how this is different from the new method splitPairStrongEnoughEvidenceForCA()
if (involvesRefPositionSwitch(regionWithLowerCoordOnContig, regionWithHigherCoordOnContig)) | ||
return new Tuple2<>(regionWithHigherCoordOnContig.referenceSpan, regionWithLowerCoordOnContig.referenceSpan); | ||
else | ||
if (isForwardStrandRepresentation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used in two places: BC.iniForInsDel()
and NARL.leftJustifyBreakpoints()
.
The first use is trivial as it is type specific, whereas for the 2nd use in NARL, we need to be careful since we are considering expanding the NARL to cover cases of ref order switch.
@@ -61,60 +61,57 @@ protected NovelAdjacencyReferenceLocations(final Kryo kryo, final Input input) { | |||
@VisibleForTesting | |||
static Tuple2<SimpleInterval, SimpleInterval> leftJustifyBreakpoints(final ChimericAlignment ca, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be picked up.
closing since the changes are already in master, and in a branch about to generate a new PR |
Dealing with case when two alignment blocks contain each other in their ref span, indicating duplication.
Instead of outputting CIGARs for the duplicated units like we did for duplication records now in master pipeline, we output alt haplotype sequence from the evidence contigs we assembled, since we did an experiment work with inverted duplications.
Some performance evaluation based on the logs
No variants that were dropped are simple variants, and they are expected to be brought back with the correct interpretation once complex sv PR series are fully coded.
Two known issues: