-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(SV) clean up and add documentation to AssemblyContigAlignmentsConfigPicker #4971
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4971 +/- ##
===============================================
- Coverage 86.352% 86.194% -0.157%
- Complexity 28592 29040 +448
===============================================
Files 1782 1782
Lines 132367 133721 +1354
Branches 14746 15130 +384
===============================================
+ Hits 114301 115260 +959
- Misses 12753 13065 +312
- Partials 5313 5396 +83
|
Forgot to add information about effect of adding in the optional commit: Based on the concordance tool, turning on/off the optional commit, the trade off is a handful of TP/FP. |
9c98466
to
9eb6dfe
Compare
8cf10a3
to
3028331
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly fine; most of my comments are just wording changes to your documentation.
* or: | ||
* has more than 1 mapping, but only 1 mapping has MQ strictly above this threshold and it has a large gap in it. | ||
* <p> | ||
* A filter that is used to remove contigs upfront which doesn't meet the following criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"which don't"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Filters an input of SAM file containing alignments of a single-ended long read that | ||
* aims at providing an "optimal coverage" of the assembly contig, based on an heuristic scoring scheme | ||
* Filters input alignments of single-ended long reads, e.g. local assembly contigs, | ||
* with the purpose being to provide "optimal coverage" of the assembly contig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"with the objective of choosing a set of alignments that provide "optimal coverage" of the assembly contig"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* first group the alignments of each local assembly contig together | ||
* </li> | ||
* <li> | ||
* then it exhaustively iterate through all possible combinations (named configuration) of the alignments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"then exhaustively iterate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* there are multiple "optimal" configurations, and when this happens, all of them are returned | ||
* </li> | ||
* <li> | ||
* alignments containing large gaps are finally split into multiple ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are finally split at gap start and end locations"
this applies to insertions too, right? Maybe replace the word gap with "insertions and deletions"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. by "gap" I mean insertion and deletions but not N
's
* <li> | ||
* alignments containing large gaps are finally split into multiple ones | ||
* (note, see warning in {@link #splitGaps(GoodAndBadMappings, boolean)}) | ||
* A final round of pruning, in order to further trim away alignments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be its own list item in the algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "remove uninformative alignments" instead of "trim away alignments that offers more noise than signal"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done and done
* picture: ref span by gapped original alignment ------------- -------- | ||
* ref span by overlapping other alignment --------------------------- | ||
* | ||
* TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on what the TODO is in the comment? Ie. "choose one of these options"..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. What I wanted to say is that here we have two options, choose one, and evaluate (right now the evaluation says the choosing either is fine) when other changes are made.
* better read coverage/breadth compared to the other, if so, keep the children alignments from the gap split, | ||
* and ditch the other alignment, otherwise ditch the alignment gapped original alignment, | ||
* the aim is to keep the split children alignments together | ||
* 2) drop the gap-split children alignment whose read span is contained in other's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"drop the gap-split child alignments whose read spans are contained in other alignment spans"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if ( ! stateAndAlignment._1 ) | ||
continue; | ||
final AlignmentInterval one = stateAndAlignment._2; | ||
for (int j = 0; j < count; ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be initialized at j = i + 1
to save work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thanks!
final int threshold) { | ||
static GoodAndBadMappings splitGapsAndDropAlignmentContainedByOtherOnRead(final GoodAndBadMappings configuration) { | ||
final List<AlignmentInterval> originalGoodMappings = configuration.goodMappings; | ||
final List<Tuple2<Boolean, AlignmentInterval>> alignmentSplitChildren = originalGoodMappings.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to put these in a List<Tuple2<Boolean, AligmentInterval>>
. Why not first just make a List<AlignmentInterval>
. Then loop through it with i and j indices as below, but instead of resetting the tuple in this list just pop them into the good
and bad
lists below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. done
} | ||
} | ||
|
||
//================================================================================================================== | ||
// functionality group 3.2: after gap split, further reclassify some good mapping as bad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a little description of which mappings are getting filtered out here to the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a short link to removeNonUniqueMappings()
, which explains the criteria
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed all comments, Chris.
Can you take a look please? Thanks!
* or: | ||
* has more than 1 mapping, but only 1 mapping has MQ strictly above this threshold and it has a large gap in it. | ||
* <p> | ||
* A filter that is used to remove contigs upfront which doesn't meet the following criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Filters an input of SAM file containing alignments of a single-ended long read that | ||
* aims at providing an "optimal coverage" of the assembly contig, based on an heuristic scoring scheme | ||
* Filters input alignments of single-ended long reads, e.g. local assembly contigs, | ||
* with the purpose being to provide "optimal coverage" of the assembly contig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* first group the alignments of each local assembly contig together | ||
* </li> | ||
* <li> | ||
* then it exhaustively iterate through all possible combinations (named configuration) of the alignments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* there are multiple "optimal" configurations, and when this happens, all of them are returned | ||
* </li> | ||
* <li> | ||
* alignments containing large gaps are finally split into multiple ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. by "gap" I mean insertion and deletions but not N
's
* <li> | ||
* alignments containing large gaps are finally split into multiple ones | ||
* (note, see warning in {@link #splitGaps(GoodAndBadMappings, boolean)}) | ||
* A final round of pruning, in order to further trim away alignments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done and done
* picture: ref span by gapped original alignment ------------- -------- | ||
* ref span by overlapping other alignment --------------------------- | ||
* | ||
* TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. What I wanted to say is that here we have two options, choose one, and evaluate (right now the evaluation says the choosing either is fine) when other changes are made.
* better read coverage/breadth compared to the other, if so, keep the children alignments from the gap split, | ||
* and ditch the other alignment, otherwise ditch the alignment gapped original alignment, | ||
* the aim is to keep the split children alignments together | ||
* 2) drop the gap-split children alignment whose read span is contained in other's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if ( ! stateAndAlignment._1 ) | ||
continue; | ||
final AlignmentInterval one = stateAndAlignment._2; | ||
for (int j = 0; j < count; ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thanks!
} | ||
} | ||
|
||
//================================================================================================================== | ||
// functionality group 3.2: after gap split, further reclassify some good mapping as bad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a short link to removeNonUniqueMappings()
, which explains the criteria
final int threshold) { | ||
static GoodAndBadMappings splitGapsAndDropAlignmentContainedByOtherOnRead(final GoodAndBadMappings configuration) { | ||
final List<AlignmentInterval> originalGoodMappings = configuration.goodMappings; | ||
final List<Tuple2<Boolean, AlignmentInterval>> alignmentSplitChildren = originalGoodMappings.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. done
return Utils.stream(split).map(ai -> new Tuple2<>(true, ai)); | ||
}).collect(Collectors.toList()); | ||
final int count = alignmentSplitChildren.size(); | ||
final List<AlignmentInterval> gapSplit = new ArrayList<>(originalGoodMappings.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion but I think formulating this as stream
-> flatMap
-> "collection as list" might be a little cleaner than creating the list upfront and then adding things to it inside forEach
es.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Looks good.
* clean up and bump test coverage * logic change: new option to keep gap-split alignments together or not, and opts to NOT to keep them together when one is contained by other alignments
14868f3
to
93ef08f
Compare
No description provided.