Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for right-shifting deletions on AA sequences. #512

Merged
merged 1 commit into from
Jun 7, 2021

Conversation

holtgrewe
Copy link
Member

closes #498
closes #499

** This is a combination of 4 commits.

This is the 1st commit message:

Problem:

The previous code addressed the problems with deletions introducing
frameshifts by avoiding right-shifting in those cases altogether,
by checking change.getAlt().length() != 0 in
AminoAcidChangeNormalizer#normalizeDeletion.
This may be wrong if the frame-shifted sequence introduces the same
amino acid(s) that would have followed in the wild type as well.

It also does not address the following corner case:

  • consider a deletion coinciding with the start of a codon, so that
    changeBeginPos.getFrameshift() == 0 in the calling code
    (DeletionAnnotationBuilder), which means that insAA will be empty and
    thus normalization will be attempted by
    AminoAcidChangeNormalizer#normalizeDeletion

  • assume the length of the deletion is not divisible by three, so it
    will itself introduce a frameshift, so that varAASeq (the AA sequence
    containing the variant) will differ significantly from from wtAASeq
    (the wildtype AA sequence) at the position of the deletion
    (and downstream)

  • since AminoAcidChangeNormalizer#normalizeDeletion only operates on the
    wildtype AA sequence (wtAASeq), this goes unnoticed, and in case the
    sequence of deleted amino acids starts with the same amino acid(s) as
    those after the deletion in the wild type, the variant will be
    erroneously right-shifted on that wild-type sequence
    (the right-shifting currently did not consider varAASeq at all)

Example:

  H | T | G | E | K | P |fs F| E | C |

GGAAACAT|ACT|GGG|GAG|AAA|CCC| TTT|GAG|TGT|CCCAAATGTGGGAAGTGTTACTTTCGG...
GGAAACAT|ACT|GGG|---|---|---|-TTG|AGT|GTC|CCAAATGT..
| L | S | V |

Before AminoAcidChangeNormalizer#normalizeDeletion gets called,
AminoAcidChange will be 'EKPF'>''. However, due to the above problem
this will be right-shifted to 'KPFE'>'', which leads to a wrong position
and a wrong AA displayed in the variant's protein change HGVS
annotation.

In this example, the protein change should be p.(Glu316Leufs25)
(in one-letter codes, p.(E316Lfs
25)) but is erroneously changed to
p.(Lys317Serfs24) (p.(K317Sfs24)).

Solution:

In contrast to the right-shifting code for nucleotides, it seems a
better approach here would be to simply compare both wild-type and
variant AA sequence, as they already have been computed by the
AminoAcidChangeNormalizer anyhow. So, varAASeq is now passed into
AminoAcidChangeNormalizer#normalizeDeletion and compared to the wildtype
AA sequence. This also allows us to apply right-shifting to deletions
that do not coincide with the beginning of a codon.

Tests:

This adds both an 'integration' test with the above example to
DeletionAnnotationBuilderTest and some unit tests to
AminoAcidChangeTest. The latter has also been adjusted to make it clear
that the tested operations should work on AA sequences, not nucleotide
sequences (globally replaced 'A' by 'L').

This is the commit message #2:

#498: fix normalizeDeletion(...) javadoc

This is the commit message #3:

498: refactor+fix shift for synonymous AA changes

This introduces the changes from 2190296 to
BlockSubstitutionAnnotationBuilder and InsertionAnnotationBuilder
as well.

Additionally, this removes now-obsolete code and moves the call to
'trim' the AA change before shifting to AminoAcidChangeNormalizer
itself, and also adds and fixes tests.

Co-authored-by: Roland Ewald [email protected]

@holtgrewe holtgrewe self-assigned this Jun 7, 2021
**This is a combination of 4 commits.**

**This is the 1st commit message:**

Problem:

The previous code addressed the problems with deletions introducing
frameshifts by avoiding right-shifting in those cases altogether,
by checking change.getAlt().length() != 0 in
AminoAcidChangeNormalizer#normalizeDeletion.
This may be wrong if the frame-shifted sequence introduces the same
amino acid(s) that would have followed in the wild type as well.

It also does not address the following corner case:

- consider a deletion coinciding with the start of a codon, so that
changeBeginPos.getFrameshift() == 0 in the calling code
(DeletionAnnotationBuilder), which means that insAA will be empty and
thus normalization will be attempted by
AminoAcidChangeNormalizer#normalizeDeletion

- assume the length of the deletion is not divisible by three, so it
will itself introduce a frameshift, so that varAASeq (the AA sequence
containing the variant) will differ significantly from from wtAASeq
(the wildtype AA sequence) at the position of the deletion
(and downstream)

- since AminoAcidChangeNormalizer#normalizeDeletion only operates on the
wildtype AA sequence (wtAASeq), this goes unnoticed, and in case the
sequence of deleted amino acids starts with the same amino acid(s) as
those after the deletion in the wild type, the variant will be
erroneously right-shifted on that wild-type sequence
(the right-shifting currently did not consider varAASeq at all)

Example:

      H | T | G | E | K | P |fs F| E | C |
GGAAACAT|ACT|GGG|GAG|AAA|CCC| TTT|GAG|TGT|CCCAAATGTGGGAAGTGTTACTTTCGG...
GGAAACAT|ACT|GGG|---|---|---|-TTG|AGT|GTC|CCAAATGT..
                            |  L | S | V |

Before AminoAcidChangeNormalizer#normalizeDeletion gets called,
AminoAcidChange will be 'EKPF'>''. However, due to the above problem
this will be right-shifted to 'KPFE'>'', which leads to a wrong position
and a wrong AA displayed in the variant's protein change HGVS
annotation.

In this example, the protein change should be p.(Glu316Leufs*25)
(in one-letter codes, p.(E316Lfs*25)) but is erroneously changed to
p.(Lys317Serfs*24) (p.(K317Sfs*24)).

Solution:

In contrast to the right-shifting code for nucleotides, it seems a
better approach here would be to simply compare both wild-type and
variant AA sequence, as they already have been computed by the
AminoAcidChangeNormalizer anyhow. So, varAASeq is now passed into
AminoAcidChangeNormalizer#normalizeDeletion and compared to the wildtype
AA sequence. This also allows us to apply right-shifting to deletions
that do not coincide with the beginning of a codon.

Tests:

This adds both an 'integration' test with the above example to
DeletionAnnotationBuilderTest and some unit tests to
AminoAcidChangeTest. The latter has also been adjusted to make it clear
that the tested operations should work on AA sequences, not nucleotide
sequences (globally replaced 'A' by 'L').

**This is the commit message #2:**

**#498: fix normalizeDeletion(...) javadoc**

**This is the commit message #3:**

**498: refactor+fix shift for synonymous AA changes**

This introduces the changes from 2190296 to
BlockSubstitutionAnnotationBuilder and InsertionAnnotationBuilder
as well.

Additionally, this removes now-obsolete code and moves the call to
'trim' the AA change before shifting to AminoAcidChangeNormalizer
itself, and also adds and fixes tests.

Co-authored-by: Roland Ewald <[email protected]>
@holtgrewe holtgrewe force-pushed the limbus-medtec-fix-498 branch from 1d36999 to 6f70e98 Compare June 7, 2021 15:23
@holtgrewe
Copy link
Member Author

@roland-ewald OK, will merge once CI runs through. Thanks again. This will be part of the (close) next release.

@holtgrewe holtgrewe merged commit f8497b4 into master Jun 7, 2021
@holtgrewe holtgrewe deleted the limbus-medtec-fix-498 branch June 7, 2021 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faulty right-shifting of deletions for protein annotations
2 participants