-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(SV) consolidate logic in simple chimera inference, update how variants are represented in VCF emitted by new code path #4663
Conversation
c96209a
to
5b8e571
Compare
Codecov Report
@@ Coverage Diff @@
## master #4663 +/- ##
===============================================
+ Coverage 79.973% 80.055% +0.082%
- Complexity 17421 17476 +55
===============================================
Files 1081 1080 -1
Lines 63140 63456 +316
Branches 10200 10271 +71
===============================================
+ Hits 50495 50800 +305
Misses 8656 8656
- Partials 3989 4000 +11
|
5b8e571
to
954cf6d
Compare
954cf6d
to
8bb042b
Compare
Feature (this) branch run log
Master branch run log
As can be seen, the stable version of the interpretation tool is not affected, where as the experimental version is affected expectedly, in the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bunch of questions about some of the refactoring here.
final Broadcast<ReferenceMultiSource> referenceBroadcast = svDiscoveryInputData.referenceBroadcast; | ||
final Broadcast<SVIntervalTree<VariantContext>> cnvCallsBroadcast = svDiscoveryInputData.cnvCallsBroadcast; | ||
final String updatedOutputPath = svDiscoveryInputData.outputPath + "experimentalInterpretation_"; | ||
final SvDiscoveryInputData.SampleSpecificData updatedSampleSpecificData = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this whole process very confusing. Why do we need to make a new SvDiscoveryInputData.SampleSpecificData
that has the contigRawAlignments
updated? It looks like it runs through the old pipeline once with contigRawAlignments
set to getReads()
, which are actually the aligned input reads, and then through the new pipeline with contigRawAlignments
set to the aligned contigs (which makes more sense with the name of the field). Why not just have two different fields, named appropriately for what's going to be in them? Then you wouldn't have to create a new input object for the second run to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is confusing, and it is a bit difficult to explain, so I drew the following chart describing what's happening:
It looks like it runs through the old pipeline once with contigRawAlignments set to getReads(),
those are actually not used, as the old pipeline takes (see diagram above) anJaraRDD<AlignedContig>
So I took the contigRawAlignments
out of SvDiscoveryInputData
, and renamed SvDiscoveryInputData
to SvDiscoveryInputMetaData
* A simple struct holding information of simple chimera of a particular contig and | ||
* its good mapping to a non-canonical chromosome, if it has one. | ||
*/ | ||
@DefaultSerializer(SimpleChimeraAndNCAMstring.Serializer.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does NCAM
stand for? I'd prefer not to add a new acronym if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add the NCAM string to ChimericAlignment
? It would save a lot of lines of code and having to reason about a new object type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NCAM strands for "Non-CAnonical Mapping".
Yes, that is a better idea!
* @param inferredType inferred type of variant | ||
* @param altHaplotypeSeq alt haplotype sequence (could be null) | ||
* @param contigAlignments chimeric alignments from contigs used for generating this novel adjacency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing documentation for contigEvidence
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
/** | ||
* Write SAM file, if requested, for original alignments of contigs recognized as "Ambiguous", "Incomplete", and "MisAssemblySuspect" | ||
* TODO: 11/17/17 salvation on assembly contigs that 1) has ambiguous "best" configuration, and 2) has incomplete picture; and flag accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this todo still applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is, ticket was created awhile ago #4229.
Out of the two, those "incomplete" ones cause us more variants in the sense that
even though the contig currently doesn't cover the full event,
extending it is unlikely to change what currently can be extracted.
I'm thinking about how to make such checks and get these variants back.
else | ||
return hasIncompletePictureFromMultipleAlignments(); | ||
public enum AlignmentSignatureBasicTypes { | ||
Suspicious, Simple, Complex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest renaming Suspicious
to Unknown
. Also I like to UPCASE all enum values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -195,6 +160,137 @@ static StrandSwitch determineStrandSwitch(final AlignmentInterval first, final A | |||
} | |||
} | |||
|
|||
// ================================================================================================================= | |||
// state query block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a block of methods for querying the state of a SimpleChimera
, now removed
@@ -317,25 +319,38 @@ private static void forNonComplexVariants(final JavaRDD<AssemblyContigWithFineTu | |||
throws IOException { | |||
final SimpleNovelAdjacencyAndChimericAlignmentEvidence simpleNovelAdjacencyAndChimericAlignmentEvidence = pair._1; | |||
final List<SvType> svTypes = pair._2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're assuming here that there are either 1 or 2 elements of this list. Can you add a validation of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
return Collections.singletonList( new SimpleSVType.Deletion(this, svLength) ); | ||
final int deletedLength = leftJustifiedRightRefLoc.getStart() - leftJustifiedLeftRefLoc.getEnd(); | ||
final int insertionLength = complication.getInsertedSequenceForwardStrandRep().length(); | ||
if ( deletedLength < 50 ) { // "fat" insertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a constant for 50 and put it somewhere central?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added StructuralVariationDiscoveryArgumentCollection.STRUCTURAL_VARIANT_SIZE_LOWER_BOUND
RPL, // replacement | ||
SMALL_DUP_EXPANSION, // small duplication tandem expansion | ||
SMALL_DUP_CPX, // small duplication, either expansion or contraction, with complex structure | ||
IntraChrStrandSwitch, // intra-chromosome strand-switch novel adjacency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upcase these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return Collections.singletonList( new SimpleSVType.DuplicationTandem(this, svLength) ); | ||
final BreakpointComplications.SmallDuplicationWithPreciseDupRangeBreakpointComplications duplicationComplication = | ||
(BreakpointComplications.SmallDuplicationWithPreciseDupRangeBreakpointComplications) this.getComplication(); | ||
if (duplicationComplication.getDupSeqRepeatUnitRefSpan().size() < 50) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
1aebf69
to
fa389b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Chris, I've updated the code following the requests.
For one of the major concerns you have on performance issues by splitting the RDD's into smaller ones and then working on them, I replied with a comment and not sure yet how to better optimize it.
public static EnumMap<RawTypes, JavaRDD<AssemblyContigWithFineTunedAlignments>> preprocess(final SvDiscoveryInputData svDiscoveryInputData, | ||
final boolean writeSAMFiles) { | ||
public static AssemblyContigsClassifiedByAlignmentSignatures preprocess(final SvDiscoveryInputData svDiscoveryInputData, | ||
final boolean writeSAMFiles) { | ||
|
||
final Broadcast<SAMSequenceDictionary> referenceSequenceDictionaryBroadcast = svDiscoveryInputData.referenceData.referenceSequenceDictionaryBroadcast; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
* @param inferredType inferred type of variant | ||
* @param altHaplotypeSeq alt haplotype sequence (could be null) | ||
* @param contigAlignments chimeric alignments from contigs used for generating this novel adjacency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
* A simple struct holding information of simple chimera of a particular contig and | ||
* its good mapping to a non-canonical chromosome, if it has one. | ||
*/ | ||
@DefaultSerializer(SimpleChimeraAndNCAMstring.Serializer.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NCAM strands for "Non-CAnonical Mapping".
Yes, that is a better idea!
@@ -317,25 +319,38 @@ private static void forNonComplexVariants(final JavaRDD<AssemblyContigWithFineTu | |||
throws IOException { | |||
final SimpleNovelAdjacencyAndChimericAlignmentEvidence simpleNovelAdjacencyAndChimericAlignmentEvidence = pair._1; | |||
final List<SvType> svTypes = pair._2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
return Collections.singletonList( new SimpleSVType.Deletion(this, svLength) ); | ||
final int deletedLength = leftJustifiedRightRefLoc.getStart() - leftJustifiedLeftRefLoc.getEnd(); | ||
final int insertionLength = complication.getInsertedSequenceForwardStrandRep().length(); | ||
if ( deletedLength < 50 ) { // "fat" insertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added StructuralVariationDiscoveryArgumentCollection.STRUCTURAL_VARIANT_SIZE_LOWER_BOUND
* Write SAM file, if requested, for original alignments of contigs recognized as "Ambiguous", "Incomplete", and "MisAssemblySuspect" | ||
* TODO: 11/17/17 salvation on assembly contigs that 1) has ambiguous "best" configuration, and 2) has incomplete picture; and flag accordingly | ||
* | ||
* TODO: 3/4/18 a bug is present here that even though only one alignment has not-bad MQ, it could contain a large gap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This particular bug was fixed, but I've found another edge case that's on my radar to be fixed, example as follows (notice the 58I gap "flanked" by the short 16M):
asm005554:tig00000 0 chr3 96003552 60 1137M58I16M438S * 0 0 TATTAAAACCTAGCATTGATACTGATAGACAATGAGACTGGAGGATTTTTCTCTAGTGAAATAAATGAAGCAAGTAAAAAAATTGTCAGTTATGAAAATAAATCCTTGACCTTCACCATTTTATACTTCAATCACACAAACCCAAAATAAAACACTACTCATGATCACACAGATTCCATTCAACTTTTTAGTACTTTGATTTTAAATATTACTAGAAAATTTATGATTAGTGAATATTAGAAAAAGAAATCAGTTTCTACCATAAAAATACAGAAATACAAATAAGAAGAAAAAAAGAAAATTAGAGAAAACAGAGGCATTATCAAAGGTGATAGAACTTAAAAAAAAACCTTAATTTAATATACCAAGGTATATATGGTAATACATTACATCCATTAATATACCACAAGTTTAGATATCCCTTCTGTTCATATCCCATTGATAAATTTTTATGTAAATTGCCACATCTAAAATCAAGGAAAACAGGTAACATAGTTTGGCTGAATAGCAAATTATTTGAGAGGGGGAAAAGCAAATATACGTATCCACTAAAAAAAAACAAACAAAAACTAATATCATTATAAAAAGAGTTTTAAAAAAAGGAGAACAAAAAATATAGAGATCACAACATCAAATAACTTTTTCAATAAATATGTTCTGGAGAAAGGGATGACAACAGTTGTCTTGTGTTCATTTGCCCCACAAAAAGAGACCACAACGTCAATAAACAAGTTGACTTCAACGCCAGTGACTGAGGAGGTATGGTGGAGAGAACCAGGGGAGCAGCAAAATCTCTGTAGAACATAGAAGCCCAGGATAACACCATAGAGAGGGGAATAAGACATCCCACCCCTGCCACACTGTCTCCCCTGATAGGATTGGCTCAGTGGTGGGAGGACTTATTTTAGGAAAAAGATAAGCTGGAGATCCCTCTTGGTCCCCATTTCTACCATGGACACAAGACAACCTTTCTACAGGAGAGACCCACTGTCCTCACAGCCCCTAAATCCAGTTTGGAGAGTTCTTAAAAGTTTACACCATGCATCCCCCACTCTCATGATCTAAAAATTATACTTTTTTATATATGTATGCATACATATATAATTTTTATATATGTATGCATACATATATAATTTTTATATATGTATGCATACATATATAATTTTTATATATGTATGCATACATATAATTTTTATATATGTATGCATACATATATAATTTTTATATGTATGCATACATATATAATTTTTTATATGTATGCATACATATATAATTTTTATATGTATGCATACAATATAATTTTTTATATATTGTATGCATACATATACTGTTTTATATATGTATGCATACATATATACTTTTTTGTATGTACACATACACATATAATTTTTTATATATGCATGCATACATATATGATTTTTATATAGGCATGCATACATATATAATTTTTTATATATGCATGCATATATAAATGCATATAACTTTTTATATATGTATGCATACATAACTTTTTATATATGTATGCATACATAACTTTTTATATATGTATGCATACATAACTTTTATATATGTATGCATACATAACTTTTTATATATGTATGCATACATATATAACTTTTTATATATGTATGCATACATATATAACTTTT * SA:Z:chr3,96004627,+,1189S35M26I399M,60,26; MD:Z:1153 RG:Z:GATKSVContigAlignments NM:i:58 AS:i:1079 XS:i:58
|
||
//================================================================================================================== | ||
|
||
public static final int LAST_ROUND_TUNING_ALIGNMENT_MAPQUAL_THREHOLD = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters (name now updated) and the following methods are moved here from the now gone classAssemblyContigAlignmentSignatureClassifier
,
the intention is documented in the method removeNonUniqueMappings
.
* See {@link #removeNonUniqueMappings(List, int, int)} | ||
*/ | ||
@VisibleForTesting | ||
static AssemblyContigWithFineTunedAlignments lastRoundAlignmentFineTuning(final AssemblyContigWithFineTunedAlignments tig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this middle man method is now removed, the intention is documented in removeNonUniqueMappings
* </li> | ||
* <li> | ||
* there's strand switch, | ||
* todo: this obsoletes the actual use of SimpleChimera#isCandidateInvertedDuplication() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've update the comment and moved the todo to SimpleChimera.isCandidateInvertedDuplication()
.
The problem at hand, essentially, is that this way of classifying the contig as incomplete makes that function "useless" in the new pipeline because the contigs won't be sent down for analysis.
|| | ||
(firstAndLastAlignmentMappedToSameChr() && hasIncompletePictureDueToOverlappingRefOrderSwitch()); | ||
} | ||
public static boolean hasIncompletePictureFromTwoAlignments(final AlignmentInterval headAlignment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because some logic are different are different, one such example: for 2-alignment contigs, alignments to different chromosomes are considered OK, but for multiple-alignment contigs, head and tail alignments must be mapped to the same chromosome for it NOT to be considered "incomplete". The motivation is that the 2-alignment case could produce an BND record (or two if both mates count), whereas the multiple-alignment contigs could emit several and I'm not sure yet how to make interpretation (this is tied to the more general question of how to extract variants from the "incomplete" contigs).
} | ||
|
||
@Override | ||
public int hashCode() { | ||
int result = sourceContigName.hashCode(); | ||
result = 31 * result + regionWithLowerCoordOnContig.hashCode(); | ||
result = 31 * result + regionWithHigherCoordOnContig.hashCode(); | ||
result = 31 * result + strandSwitch.ordinal(); | ||
result = 31 * result + strandSwitch.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you regenerated this hashCode
method and reverted from using ordinal
, which will potentially give rise to bugs as we learned before. Please change back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Thanks for catching that! auto-generated by Intellij....
@@ -165,7 +165,16 @@ public final boolean hasEquallyGoodAlnConfigurations() { | |||
return hasEquallyGoodAlnConfigurations; | |||
} | |||
|
|||
public final boolean hasIncompletePicture() { | |||
/** | |||
* This method exist because extending the assembly contig in either direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"method exists"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
* | ||
* If the assembly contig, with its two (picked) alignments, shows any of the following signature, | ||
* If the assembly contig's alignments show any of the following signature, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"signatures"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
* | ||
* If the assembly contig, with its two (picked) alignments, shows any of the following signature, | ||
* If the assembly contig's alignments show any of the following signature, | ||
* it will be classified as incomplete. | ||
* it is definitely not giving the whole picture of the alt haplotype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your edits I'd now start this sentence off, "Incomplete alignments do not give the whole picture.."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -279,8 +279,13 @@ private boolean hasIncompletePicture() { | |||
return AssemblyContigWithFineTunedAlignments.hasIncompletePictureFromTwoAlignments(regionWithLowerCoordOnContig, regionWithHigherCoordOnContig); | |||
} | |||
|
|||
// TODO: 5/5/18 Note that the use of the following predicate is currently obsoleted by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this comment: this method is still being called in several places, so how is it obsolete?
@@ -186,7 +189,7 @@ protected void runTool( final JavaSparkContext ctx ) { | |||
|
|||
// TODO: 1/14/18 this is the next version of precise variant calling | |||
if ( expInterpret != null ) { | |||
experimentalInterpretation(ctx, assembledEvidenceResults, svDiscoveryInputMetaData); | |||
experimentalInterpretation(ctx, expInterpret, assembledEvidenceResults, svDiscoveryInputMetaData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use of this boolean variable? Why not just add the check that it's true to the if clause on the previous line, then you don't need the extra parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! I think the original CMD line option was a string telling where to store the whole directory of experimental interpretation, and was passed in as a String. When I removed the directory creation and switched to the flag, I forgot to remove this parameter.
private void writeSAMfilesForUnknown(final String outputPrefix, final JavaRDD<GATKRead> assemblyRawAlignments, | ||
final SAMFileHeader header) { | ||
|
||
final Map<ReasonForAlignmentClassificationFailure, Iterable<String>> reasonToContigNames = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better but you are still collecting the RDD and passing over the collection three times. What I meant by my original suggestion was this: Why not make the map go the other way, ie make a Map<String, ReasonForAlignmentClassificationFailure>
that maps contig names to their reasons? Then make a Map<ReasonForAlignmentClassificationFailure, SAMFileWriter>
with three entries. Then you only have to iterate over the collection of reads once to write everything out (you just look up the writer for each entry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better after changes. The most important thing remaining to fix is the hashCode
method on SimpleChimera
which looks vulnerable to the serialized-hashcode-on-enums problem again. Also consider my comments on writing the three bam files.
For record keeping, as the comments and replies may be buried in the many commits On the problem of too many splits of RDD and performance concernsInitial comment by @cwhelan :
Reply by @SHuang-Broad
Reply by @cwhelan
Reply by @SHuang-Broad
On the problem of having a confusing TODO for
The todo message
Comment by @cwhelan
Reply by @SHuang-Broad (also copied to update the "TODO" message
On the problem of writing out SAM records of "Unknown" contigs efficientlyFirst round comment by @cwhelan
Second round comment by @cwhelan
Reply by @SHuang-Broad
|
* move method parseOneContig() in ChimericAlignments to DiscoverVariantsFromContigAlignmentsSAMSpark considering it is only used there * refactor and update SvDiscoveryInputData, also rename it to SvDiscoveryInputMetaData and moved contig alignment out of it * refactor to simplify contig alignment signature basic type (suspicious, simple, complex) classification, hence removing class AssemblyContigAlignmentSignatureClassifier * simplify SimpleNovelAdjacencyAndChimericAlignmentEvidence (SV) Representation change commit: * change SVLEN for CPX variants to the difference between [alt haplotype sequence length] and [affected reference region length] * change how replacement (RPL) variants are represented in NEW CODE PATH ONLY * change how DUP variants with short duplicated ref region are represented in NEW CODE PATH ONLY
fa389b0
to
ffbca98
Compare
Part of road map laid out in #4111
Consolidate logic, update variant representation (PR#4663)
consolidate logic in the following classes
AssemblyContigAlignmentSignatureClassifier
now gone, its inner enum classRawTypes
is moved toAssemblyContigWithFineTunedAlignments.AlignmentSignatureBasicTypes
and reduced into fewer cases (Suspicious
,Simple
andComplex
)static method
BreakpointsInference.inferFromSimpleChimera()
now moved to state query methodChimericAlignment.inferType()
AssemblyContigWithFineTunedAlignments.hasIncompletePictureFromTwoAlignments()
merged withChimericAlignment.hasIncompletePicture()
update how variants are represented
change
SVLEN
forCPX
variants to the difference between [alt haplotype sequence length] and [affected reference region length], which is following the technical definition ofSVLEN
in VCF spec.change
RPL
output to one of these (note that test coverage is expected)EVENT
change
SVTYPE=DUP
toSVYTPE=INS
when the duplicated region is shorter than 50 bp (tests). Note that this will lead toINS
records withDUP_REPEAT_UNIT_REF_SPAN
andDUP_SEQ_CIGARS
(when available).In addition, we are currently treating duplication expansion as insertion.
The VCF spec doesn't force
DUP
records as such.If we decide to allow
POS
andEND
to designate the beginning and end of the duplicated reference region, we need to make at least the following change:downstreamBreakpointRefPos = complication.getDupSeqRepeatUnitRefSpan().getEnd();
bump test coverage
SimpleNovelAdjacencyAndChimericAlignmentEvidence
serialization testNovelAdjacencyAndAltHaplotype.toSimpleOrBNDTypes()
(but only related toSimpleSvType
's, BND's will wait for a later PR.