-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to the updated type & location inference tool in SV pipeline #4111
Comments
Updated plan Small improvements in new interpretation tool
Consolidate logic, bump test coverage and update how variants are representedconsolidate logicWhen initially prototyped, there's redundancy in logic for simple variants, now it's time to consolidate.
bump test coverageOnce code above is consolidated, bump test coverage, particularly for the classes above and the following poorly-covered classes
update how variants are representedImplement the following representation changes that should make type-based evaluation easier
CPX variant re-interpretationSend cpx variant for re-interpretation of simple basic types, and check for consistency (this might be the difficult part) |
As we have finished implementing the updated logic for how variants are interpreted and location inferred by studying local assembly contig alignment signatures, it is time to clean up the corresponding package in the pipeline and make the switch to the updated implementation, which now outputs not only insertion, deletion, small tandem duplication, and inversions, but also novel adjacencies (BND records whose meanings cannot be fully resolved solely from assembly alignment signatures) as well as complex variants that theoretically could be arbitrarily complex (
<CPX>
, as long as we have assembled across the full event).Planed organization
the
discovery
package could be divided roughly now intointerface
SvDiscoveryDataBundle
,SvDiscoverFromLocalAssemblyContigAlignmentsSpark
,SvType
,AnnotatedVariantProducer
alignment prep (sub package)
AlignmentInterval
,AlignedContig
(refactorAssemblyContigWithFineTunedAlignments
intoAlignedContig
),AlignedContigGenerator
,AlignedAssembly
,ContigAlignmentsModifier
(refactorAlnModType
into it),GappedAlignmentSplitter
,StrandSwitch
,FilterLongReadAlignmentsSAMSpark
(factor out the major methods in the new alignment filter by score into a 1st level class)type & location inference (sub package)
imprecise: refactor out methods from to-be-deprecated
DiscoverVariantsFromContigAlignmentsSAMSpark
alignment classification:
ChimericAlignment
andNovelAdjacencyReferenceLocations
(very tricky to decouple the functionalities because both have over 50 uses),AssemblyContigAlignmentSignatureClassifier
,VariantDetectorFromLocalAssemblyContigAlignments
simple:
SimpleSVType
,SvTypeInference
,InsDelVariantDetector
,BreakpointComplications
(rename toBreakpointComplicationsForSimpleTypes
)complex:
BreakEndVariantType
,SuspectedTransLocDetector
,SimpleStrandSwitchVariantDetector
deprecated
DiscoverVariantsFromContigAlignmentsSAMSpark
It currently provides 3 groups of functionalities:
ChimericAlignment.parseOneContig
andNovelAdjacencyReferenceLocations(ChimericAlignment chimericAlignment, byte[] contigSequence, SAMSequenceDictionary)
; this should be deprecatedSvTypeInference.inferFromNovelAdjacency()
) and annotation (delegated toAnnotatedVariantProducer.produceAnnotatedVcFromInferredTypeAndRefLocations()
); this should be deprecatedPlaned steps
make
StructuralVariationDiscoveryPipelineSpark
call intoSvDiscoverFromLocalAssemblyContigAlignmentsSpark
by default and optionally intoDiscoverVariantsFromContigAlignmentsSAMSpark
, i.e. opposite of what we currently do.The text was updated successfully, but these errors were encountered: