Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOLD: Bring in more duplication expansion calls #3668

Closed
wants to merge 4 commits into from

Conversation

SHuang-Broad
Copy link
Contributor

Dealing with case when two alignment blocks contain each other in their ref span, indicating duplication.
Instead of outputting CIGARs for the duplicated units like we did for duplication records now in master pipeline, we output alt haplotype sequence from the evidence contigs we assembled, since we did an experiment work with inverted duplications.

Some performance evaluation based on the logs

master.vcf

10:28:46.262 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - Discovered 6324 variants.
10:28:46.277 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - INV: 237
10:28:46.277 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - DEL: 3680
10:28:46.277 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - DUP: 1141
10:28:46.277 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - INS: 1266
10:28:46.483 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - Shutting down engine
[October 4, 2017 10:28:46 AM EDT] org.broadinstitute.hellbender.tools.spark.sv.discovery.DiscoverVariantsFromContigAlignmentsSAMSpark done. Elapsed time: 0.53 minutes.
Runtime.totalMemory()=3954180096

==============

feature.vcf

13:51:48.490 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - Discovered 6543 variants.
13:51:48.502 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - INV: 229
13:51:48.502 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - DEL: 3679
13:51:48.502 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - DUP: 1365
13:51:48.502 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - INS: 1270
13:51:48.770 INFO  DiscoverVariantsFromContigAlignmentsSAMSpark - Shutting down engine
[October 5, 2017 1:51:48 PM EDT] org.broadinstitute.hellbender.tools.spark.sv.discovery.DiscoverVariantsFromContigAlignmentsSAMSpark done. Elapsed time: 0.49 minutes.
Runtime.totalMemory()=4026531840

No variants that were dropped are simple variants, and they are expected to be brought back with the correct interpretation once complex sv PR series are fully coded.

Two known issues:

  1. Arguably, these calls may not have high confidence since we are likely NOT having the duplicated region fully assembled. But we could develop a filter later and be less stringent in the discovery stage.
  2. The inserted sequence mapping annotation is still an issue we need to iron out, in the sense that when one ref span is a completely enclosed in the other with some bases in the larger ref span uncovered by the the smaller ref span (i.e. a true containment from both boundaries instead of a one-boundary coincidence), there's actually insert sequence between the two copies of the duplicated sequence (this is why alt haplotype sequence is generated rather than CIGARs).

@codecov-io
Copy link

codecov-io commented Oct 5, 2017

Codecov Report

Merging #3668 into master will increase coverage by 0.98%.
The diff coverage is 46.352%.

@@              Coverage Diff               @@
##              master     #3668      +/-   ##
==============================================
+ Coverage     78.463%   79.444%   +0.98%     
- Complexity     16421     17539    +1118     
==============================================
  Files           1041      1137      +96     
  Lines          59099     63304    +4205     
  Branches        9673      9668       -5     
==============================================
+ Hits           46371     50291    +3920     
- Misses          8985      9157     +172     
- Partials        3743      3856     +113
Impacted Files Coverage Δ Complexity Δ
...ender/tools/spark/sv/utils/GATKSVVCFConstants.java 75% <ø> (ø) 1 <0> (ø) ⬇️
...tute/hellbender/tools/spark/sv/utils/RDDUtils.java 0% <ø> (-80%) 0 <0> (-4)
.../sv/discovery/prototype/InsDelVariantDetector.java 0% <0%> (ø) 0 <0> (?)
...y/prototype/SimpleStrandSwitchVariantDetector.java 28.571% <0%> (ø) 13 <0> (?)
...ry/prototype/FilterLongReadAlignmentsSAMSpark.java 55.652% <0%> (ø) 29 <0> (?)
...nder/tools/spark/sv/discovery/SvTypeInference.java 79.31% <100%> (ø) 7 <0> (?)
...s/spark/sv/discovery/AnnotatedVariantProducer.java 74.79% <100%> (-2.988%) 23 <0> (+1)
...der/tools/spark/sv/utils/GATKSVVCFHeaderLines.java 81.356% <100%> (-0.311%) 7 <0> (ø)
...ls/spark/sv/discovery/BreakpointComplications.java 55.234% <24.074%> (ø) 69 <5> (?)
...er/tools/spark/sv/discovery/ChimericAlignment.java 69.608% <45.161%> (ø) 30 <15> (?)
... and 785 more

@SHuang-Broad SHuang-Broad force-pushed the sh_insdel_containment_bug branch 2 times, most recently from dcfd714 to 1cd756b Compare October 12, 2017 18:05
@SHuang-Broad SHuang-Broad force-pushed the sh_insdel_containment_bug branch from 1cd756b to 7d20771 Compare October 17, 2017 15:58
Copy link
Contributor

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #3475, conflicts are expected in the test classes, because they are now extending GATKBaseTest - a rebase should solve most of the problems without need of change of the code (except comments below)

import static org.broadinstitute.hellbender.tools.spark.sv.utils.GATKSVVCFConstants.*;


public class SimpleSvTypeInferenceAndConstructionUnitTest extends BaseTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #3475, this should extend GATKBaseTest

@SHuang-Broad SHuang-Broad changed the title Bring in more duplication expansion calls HOLD: Bring in more duplication expansion calls Nov 9, 2017
@SHuang-Broad SHuang-Broad force-pushed the sh_insdel_containment_bug branch from 7d20771 to 90c133b Compare January 18, 2018 20:34
Copy link
Contributor Author

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self added

@@ -131,8 +133,8 @@ SimpleInterval getInvertedTransInsertionRefSpan() {
return invertedTransInsertionRefSpan;
}

boolean hasDupSeqButNoStrandSwitch() {
return hasDuplicationAnnotation && dupSeqStrandOnCtg.stream().noneMatch(s -> s.equals(Strand.NEGATIVE));
boolean indicatesInvDup() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up 1.

private void resolveComplicationForSimpleTandupExpansion(final SimpleInterval leftReferenceInterval,
final SimpleInterval rightReferenceInterval,
final AlignmentInterval firstContigRegion,
private void resolveComplicationForSimpleTandupExpansion(final AlignmentInterval firstContigRegion,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up 3

@@ -303,6 +305,7 @@ private void initForInvDup(final ChimericAlignment chimericAlignment, final byte
*/
@VisibleForTesting
public static boolean isLikelyInvertedDuplication(final AlignmentInterval one, final AlignmentInterval two) {
if (one.forwardStrand==two.forwardStrand) return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up 2.

@@ -328,15 +331,15 @@ private void initForInsDel(final ChimericAlignment chimericAlignment, final byte

final boolean oneContainedInTheOther = leftReferenceSpan.contains(rightReferenceSpan) || rightReferenceSpan.contains(leftReferenceSpan);
if (oneContainedInTheOther) {
resolveComplicationForSimpleTandupExpansion(leftReferenceSpan, rightReferenceSpan, firstContigRegion, secondContigRegion, r1e, r2b, distBetweenAlignRegionsOnCtg, contigSeq, true);
} else if ( distBetweenAlignRegionsOnRef > 0 ) { // Deletion:
resolveComplicationForSimpleTandupExpansion(firstContigRegion, secondContigRegion, r1e, r2b, contigSeq);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up 4.1

resolveComplicationForSimpleDel(firstContigRegion, secondContigRegion, distBetweenAlignRegionsOnCtg, contigSeq);
} else if (distBetweenAlignRegionsOnRef == 0 && distBetweenAlignRegionsOnCtg > 0) { // Insertion: simple insertion, inserted sequence is the sequence [c1e+1, c2b-1] on the contig
insertedSequenceForwardStrandRep = getInsertedSequence(firstContigRegion, secondContigRegion, contigSeq);
} else if (distBetweenAlignRegionsOnRef == 0 && distBetweenAlignRegionsOnCtg < 0) { // Tandem repeat contraction: reference has two copies but one copy was deleted on the contig; duplicated sequence on reference are [r1e-|d2|+1, r1e] and [r2b, r2b+|d2|-1]
resolveComplicationForSimpleTandupContraction(leftReferenceSpan, firstContigRegion, secondContigRegion, r1e, c1e, c2b, contigSeq);
} else if (distBetweenAlignRegionsOnRef < 0 && distBetweenAlignRegionsOnCtg >= 0) { // Tandem repeat expansion: reference bases [r1e-|d1|+1, r1e] to contig bases [c1e-|d1|+1, c1e] and [c2b, c2b+|d1|-1] with optional inserted sequence [c1e+1, c2b-1] in between the two intervals on contig
resolveComplicationForSimpleTandupExpansion(leftReferenceSpan, rightReferenceSpan, firstContigRegion, secondContigRegion, r1e, r2b, distBetweenAlignRegionsOnCtg, contigSeq, false);
resolveComplicationForSimpleTandupExpansion(firstContigRegion, secondContigRegion, r1e, r2b, contigSeq);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up 4.2

@@ -386,6 +425,85 @@ private void resolveComplicationForSimpleTandupExpansion(final SimpleInterval le
}
}

static byte[] extractAltHaplotypeForTandupExpansionWithContainment(final AlignmentInterval firstContigRegion,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up 6

determineStrandSwitch(current, next), involvesRefPositionSwitch(current, next));
if (isNotSimpleTranslocation)
results.add(new ChimericAlignment(current, next, insertionMappings, alignedContig.contigName));
final ChimericAlignment chimericAlignment = new ChimericAlignment(current, next, insertionMappings, alignedContig.contigName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method to be phased out, but it makes sense to let down stream users to check what type of novel adjacency this CA could point to, which requests the state checkers to be more comprehensive

public static boolean nextAlignmentMayBeInsertion(final AlignmentInterval current, final AlignmentInterval next,
final Integer minAlignLength, final int mapQThresholdInclusive,
final boolean filterWhollyContained) {
static boolean nextAlignmentMayBeInsertion(final AlignmentInterval current, final AlignmentInterval next,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check how this is different from the new method splitPairStrongEnoughEvidenceForCA()

if (involvesRefPositionSwitch(regionWithLowerCoordOnContig, regionWithHigherCoordOnContig))
return new Tuple2<>(regionWithHigherCoordOnContig.referenceSpan, regionWithLowerCoordOnContig.referenceSpan);
else
if (isForwardStrandRepresentation) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in two places: BC.iniForInsDel() and NARL.leftJustifyBreakpoints().
The first use is trivial as it is type specific, whereas for the 2nd use in NARL, we need to be careful since we are considering expanding the NARL to cover cases of ref order switch.

@@ -61,60 +61,57 @@ protected NovelAdjacencyReferenceLocations(final Kryo kryo, final Input input) {
@VisibleForTesting
static Tuple2<SimpleInterval, SimpleInterval> leftJustifyBreakpoints(final ChimericAlignment ca,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be picked up.

@SHuang-Broad
Copy link
Contributor Author

closing since the changes are already in master, and in a branch about to generate a new PR

@SHuang-Broad SHuang-Broad deleted the sh_insdel_containment_bug branch March 25, 2018 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants