Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(SV) clean up and add documentation to AssemblyContigAlignmentsConfigPicker #4971

Merged
merged 1 commit into from
Jul 19, 2018

Conversation

SHuang-Broad
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Jun 30, 2018

Codecov Report

Merging #4971 into master will decrease coverage by 0.157%.
The diff coverage is 97.222%.

@@               Coverage Diff               @@
##              master     #4971       +/-   ##
===============================================
- Coverage     86.352%   86.194%   -0.157%     
- Complexity     28592     29040      +448     
===============================================
  Files           1782      1782               
  Lines         132367    133721     +1354     
  Branches       14746     15130      +384     
===============================================
+ Hits          114301    115260      +959     
- Misses         12753     13065      +312     
- Partials        5313      5396       +83
Impacted Files Coverage Δ Complexity Δ
...iscovery/TestUtilsForAssemblyBasedSVDiscovery.java 95.522% <100%> (+10.448%) 13 <0> (+1) ⬆️
...covery/inference/TestUtilsCpxVariantInference.java 99.512% <100%> (-0.021%) 7 <1> (ø)
...lignment/AssemblyContigAlignmentsConfigPicker.java 93.515% <91.667%> (+0.526%) 95 <27> (+8) ⬆️
.../AssemblyContigAlignmentsConfigPickerUnitTest.java 99.605% <99.429%> (+0.09%) 20 <14> (+2) ⬆️
...der/tools/walkers/mutect/M2ArgumentCollection.java 50% <0%> (-43.333%) 3% <0%> (ø)
...itute/hellbender/tools/walkers/mutect/Mutect2.java 52% <0%> (-40.857%) 18% <0%> (ø)
...hellbender/tools/walkers/mutect/Mutect2Engine.java 64.151% <0%> (-27.065%) 54% <0%> (+1%)
...broadinstitute/hellbender/utils/IntervalUtils.java 81.575% <0%> (-9.941%) 204% <0%> (+23%)
...ava/org/broadinstitute/hellbender/utils/Utils.java 73.376% <0%> (-8.156%) 238% <0%> (+80%)
...tute/hellbender/utils/samples/SampleDBBuilder.java 75% <0%> (-5%) 23% <0%> (+8%)
... and 11 more

@SHuang-Broad
Copy link
Contributor Author

Forgot to add information about effect of adding in the optional commit:

Based on the concordance tool, turning on/off the optional commit, the trade off is a handful of TP/FP.
That's why I made it optional.

@SHuang-Broad SHuang-Broad force-pushed the sh-sv-cleanup-aln-picker branch from 9c98466 to 9eb6dfe Compare July 10, 2018 16:17
@SHuang-Broad SHuang-Broad force-pushed the sh-sv-cleanup-aln-picker branch 2 times, most recently from 8cf10a3 to 3028331 Compare July 12, 2018 15:00
Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly fine; most of my comments are just wording changes to your documentation.

* or:
* has more than 1 mapping, but only 1 mapping has MQ strictly above this threshold and it has a large gap in it.
* <p>
* A filter that is used to remove contigs upfront which doesn't meet the following criteria
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which don't"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Filters an input of SAM file containing alignments of a single-ended long read that
* aims at providing an "optimal coverage" of the assembly contig, based on an heuristic scoring scheme
* Filters input alignments of single-ended long reads, e.g. local assembly contigs,
* with the purpose being to provide "optimal coverage" of the assembly contig.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with the objective of choosing a set of alignments that provide "optimal coverage" of the assembly contig"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* first group the alignments of each local assembly contig together
* </li>
* <li>
* then it exhaustively iterate through all possible combinations (named configuration) of the alignments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"then exhaustively iterate"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* there are multiple "optimal" configurations, and when this happens, all of them are returned
* </li>
* <li>
* alignments containing large gaps are finally split into multiple ones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"are finally split at gap start and end locations"

this applies to insertions too, right? Maybe replace the word gap with "insertions and deletions"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. by "gap" I mean insertion and deletions but not N's

* <li>
* alignments containing large gaps are finally split into multiple ones
* (note, see warning in {@link #splitGaps(GoodAndBadMappings, boolean)})
* A final round of pruning, in order to further trim away alignments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be its own list item in the algorithm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "remove uninformative alignments" instead of "trim away alignments that offers more noise than signal"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and done

* picture: ref span by gapped original alignment ------------- --------
* ref span by overlapping other alignment ---------------------------
*
* TODO:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on what the TODO is in the comment? Ie. "choose one of these options"..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. What I wanted to say is that here we have two options, choose one, and evaluate (right now the evaluation says the choosing either is fine) when other changes are made.

* better read coverage/breadth compared to the other, if so, keep the children alignments from the gap split,
* and ditch the other alignment, otherwise ditch the alignment gapped original alignment,
* the aim is to keep the split children alignments together
* 2) drop the gap-split children alignment whose read span is contained in other's.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"drop the gap-split child alignments whose read spans are contained in other alignment spans"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if ( ! stateAndAlignment._1 )
continue;
final AlignmentInterval one = stateAndAlignment._2;
for (int j = 0; j < count; ++j) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be initialized at j = i + 1 to save work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks!

final int threshold) {
static GoodAndBadMappings splitGapsAndDropAlignmentContainedByOtherOnRead(final GoodAndBadMappings configuration) {
final List<AlignmentInterval> originalGoodMappings = configuration.goodMappings;
final List<Tuple2<Boolean, AlignmentInterval>> alignmentSplitChildren = originalGoodMappings.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to put these in a List<Tuple2<Boolean, AligmentInterval>>. Why not first just make a List<AlignmentInterval>. Then loop through it with i and j indices as below, but instead of resetting the tuple in this list just pop them into the good and bad lists below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. done

}
}

//==================================================================================================================
// functionality group 3.2: after gap split, further reclassify some good mapping as bad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a little description of which mappings are getting filtered out here to the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a short link to removeNonUniqueMappings(), which explains the criteria

Copy link
Contributor Author

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed all comments, Chris.
Can you take a look please? Thanks!

* or:
* has more than 1 mapping, but only 1 mapping has MQ strictly above this threshold and it has a large gap in it.
* <p>
* A filter that is used to remove contigs upfront which doesn't meet the following criteria
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Filters an input of SAM file containing alignments of a single-ended long read that
* aims at providing an "optimal coverage" of the assembly contig, based on an heuristic scoring scheme
* Filters input alignments of single-ended long reads, e.g. local assembly contigs,
* with the purpose being to provide "optimal coverage" of the assembly contig.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* first group the alignments of each local assembly contig together
* </li>
* <li>
* then it exhaustively iterate through all possible combinations (named configuration) of the alignments,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* there are multiple "optimal" configurations, and when this happens, all of them are returned
* </li>
* <li>
* alignments containing large gaps are finally split into multiple ones
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. by "gap" I mean insertion and deletions but not N's

* <li>
* alignments containing large gaps are finally split into multiple ones
* (note, see warning in {@link #splitGaps(GoodAndBadMappings, boolean)})
* A final round of pruning, in order to further trim away alignments
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and done

* picture: ref span by gapped original alignment ------------- --------
* ref span by overlapping other alignment ---------------------------
*
* TODO:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. What I wanted to say is that here we have two options, choose one, and evaluate (right now the evaluation says the choosing either is fine) when other changes are made.

* better read coverage/breadth compared to the other, if so, keep the children alignments from the gap split,
* and ditch the other alignment, otherwise ditch the alignment gapped original alignment,
* the aim is to keep the split children alignments together
* 2) drop the gap-split children alignment whose read span is contained in other's.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if ( ! stateAndAlignment._1 )
continue;
final AlignmentInterval one = stateAndAlignment._2;
for (int j = 0; j < count; ++j) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks!

}
}

//==================================================================================================================
// functionality group 3.2: after gap split, further reclassify some good mapping as bad
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a short link to removeNonUniqueMappings(), which explains the criteria

final int threshold) {
static GoodAndBadMappings splitGapsAndDropAlignmentContainedByOtherOnRead(final GoodAndBadMappings configuration) {
final List<AlignmentInterval> originalGoodMappings = configuration.goodMappings;
final List<Tuple2<Boolean, AlignmentInterval>> alignmentSplitChildren = originalGoodMappings.stream()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. done

return Utils.stream(split).map(ai -> new Tuple2<>(true, ai));
}).collect(Collectors.toList());
final int count = alignmentSplitChildren.size();
final List<AlignmentInterval> gapSplit = new ArrayList<>(originalGoodMappings.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion but I think formulating this as stream -> flatMap -> "collection as list" might be a little cleaner than creating the list upfront and then adding things to it inside forEaches.

Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Looks good.

 * clean up and bump test coverage
 * logic change: new option to keep gap-split alignments together or not, and
   opts to NOT to keep them together when one is contained by other alignments
@SHuang-Broad SHuang-Broad force-pushed the sh-sv-cleanup-aln-picker branch from 14868f3 to 93ef08f Compare July 19, 2018 14:06
@SHuang-Broad SHuang-Broad merged commit 9c0a867 into master Jul 19, 2018
@SHuang-Broad SHuang-Broad deleted the sh-sv-cleanup-aln-picker branch July 19, 2018 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants