Fix for right-shifting deletions on AA sequences. #512
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes #498
closes #499
** This is a combination of 4 commits.
This is the 1st commit message:
Problem:
The previous code addressed the problems with deletions introducing
frameshifts by avoiding right-shifting in those cases altogether,
by checking change.getAlt().length() != 0 in
AminoAcidChangeNormalizer#normalizeDeletion.
This may be wrong if the frame-shifted sequence introduces the same
amino acid(s) that would have followed in the wild type as well.
It also does not address the following corner case:
consider a deletion coinciding with the start of a codon, so that
changeBeginPos.getFrameshift() == 0 in the calling code
(DeletionAnnotationBuilder), which means that insAA will be empty and
thus normalization will be attempted by
AminoAcidChangeNormalizer#normalizeDeletion
assume the length of the deletion is not divisible by three, so it
will itself introduce a frameshift, so that varAASeq (the AA sequence
containing the variant) will differ significantly from from wtAASeq
(the wildtype AA sequence) at the position of the deletion
(and downstream)
since AminoAcidChangeNormalizer#normalizeDeletion only operates on the
wildtype AA sequence (wtAASeq), this goes unnoticed, and in case the
sequence of deleted amino acids starts with the same amino acid(s) as
those after the deletion in the wild type, the variant will be
erroneously right-shifted on that wild-type sequence
(the right-shifting currently did not consider varAASeq at all)
Example:
GGAAACAT|ACT|GGG|GAG|AAA|CCC| TTT|GAG|TGT|CCCAAATGTGGGAAGTGTTACTTTCGG...
GGAAACAT|ACT|GGG|---|---|---|-TTG|AGT|GTC|CCAAATGT..
| L | S | V |
Before AminoAcidChangeNormalizer#normalizeDeletion gets called,
AminoAcidChange will be 'EKPF'>''. However, due to the above problem
this will be right-shifted to 'KPFE'>'', which leads to a wrong position
and a wrong AA displayed in the variant's protein change HGVS
annotation.
In this example, the protein change should be p.(Glu316Leufs25)
(in one-letter codes, p.(E316Lfs25)) but is erroneously changed to
p.(Lys317Serfs24) (p.(K317Sfs24)).
Solution:
In contrast to the right-shifting code for nucleotides, it seems a
better approach here would be to simply compare both wild-type and
variant AA sequence, as they already have been computed by the
AminoAcidChangeNormalizer anyhow. So, varAASeq is now passed into
AminoAcidChangeNormalizer#normalizeDeletion and compared to the wildtype
AA sequence. This also allows us to apply right-shifting to deletions
that do not coincide with the beginning of a codon.
Tests:
This adds both an 'integration' test with the above example to
DeletionAnnotationBuilderTest and some unit tests to
AminoAcidChangeTest. The latter has also been adjusted to make it clear
that the tested operations should work on AA sequences, not nucleotide
sequences (globally replaced 'A' by 'L').
This is the commit message #2:
#498: fix normalizeDeletion(...) javadoc
This is the commit message #3:
498: refactor+fix shift for synonymous AA changes
This introduces the changes from 2190296 to
BlockSubstitutionAnnotationBuilder and InsertionAnnotationBuilder
as well.
Additionally, this removes now-obsolete code and moves the call to
'trim' the AA change before shifting to AminoAcidChangeNormalizer
itself, and also adds and fixes tests.
Co-authored-by: Roland Ewald [email protected]