Fix variant region for ins and dup on intron-exon boundary #748

b0d0nne11 · 2024-09-15T17:39:20Z

Fixes #655

This PR is offered as an alternative to #709. It solves the same problem, that certain ins or dup variants that occur at the intron/exon boundaries have non-zero offsets but are still coding. The approach taken here varies from the earlier PR in a few ways:

It avoids overriding any methods
It keeps most of the replace reference changes to the variant mapper
It avoids the intermediate mapping operations
It keeps the new tests to a single new issues test file

b0d0nne11 · 2024-09-15T17:40:24Z

@reece and @andreasprlic would you mind taking a look at this and letting me know if you like the approach here better?

andreasprlic · 2024-09-16T03:30:51Z

Thanks, this looks technically more concise to me. I have a scientific question though: Is it established that insertions at the intron/exon boundary always end up as part of the exon?

In other words, do we always want to report such insertions as frameshift variants, or is there a chance they are actually not impacting the coding sequence and remain part of the intron? If we are not sure, it feels as if there should be some config, that would allow to customize the behavior of the code. I am thinking of something like the cross_boundaries parameter in normalization, where we can influence, if we want to normalize across intron / exon boundaries.

b0d0nne11 · 2024-09-16T13:39:57Z

AFAIK they will always be coding, along with the new conditions for duplications that were added. I can certainly create a feature flag for this but IMO I don't think its needed. I think the only thing that would change would be that these niche mapping cases would again return incorrect results. @gostachowiak do you have anything you want to add?

gostachowiak · 2024-09-16T20:55:30Z

@andreasprlic For the scientific question-- no, it is not established that insertions at the boundary will always be part of the exon. For any given mutation, it is impossible to know what will happen in real life.

The idea here is that since the entire intron (including canonical splice sites) is intact, it will default to treating the inserted material as being part of the exon. Since the canonical splice site is well-defined, this seems like the most rational default assumption.

I believe this is strictly better than the current behavior, where sometimes insertions at boundaries are treated as intronic, and sometimes as exonic, based on what offset values happen to appear in the nomenclature. The updated behavior is also consistent with other tools such as Mutalyzer.

A feature flag would make sense, although in my opinion it would be better to deal with in a follow-up issue, because (1) it would be non-trivial, and (2) the current pull request at least establishes a consistent baseline (i.e. eliminates the arbitrary/inconsistent results that are currently returned).

andreasprlic · 2024-09-23T20:31:06Z

Thanks @gostachowiak for the explanation. I agree, more consistent behavior would be desirable.

Having read your comment, my first thought was "this will make the variants look more pathogenic as a default". As such I had a chat with a variant scientist about this issue. The recommendation from that person is that such variants always need to be reviewed carefully and that as a general rule the goal would be to try to represent in the more benign representation, which would be as an intronic variant and potentially impacting splicing. (3' shuffling rule in HGVS nomenclature is not helpful here and would not play a role in the naming)

As such, how hard would it be to have two options here: move such insertions into the intron and as an alternative move into exon? (One could create synonyms and let a scientist chose the preferred name)

From a "hgvs as a library" perspective I can easily see that some labs have certain SOPs how they would report out variants in certain situations, such as this. However it is better if the library offers choice, rather than impose behavior when there are more than one possible answers.

github-actions · 2024-10-24T02:17:14Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-11-24T02:28:35Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-12-01T02:47:34Z

This PR was closed because it has been stalled for 7 days with no activity.

jsstevenson · 2024-12-02T15:22:00Z

I think this fell off all of our collective radars. Picking up from the last comment from @andreasprlic

As such, how hard would it be to have two options here: move such insertions into the intron and as an alternative move into exon? (One could create synonyms and let a scientist chose the preferred name)

@b0d0nne11 thoughts?

b0d0nne11 · 2024-12-26T15:41:46Z

Sorry I missed this for so long. Yes, I can make this change. It doesn't look too hard so should have something here soon. If you have any recommendations for the configuration value name that would be great. I'll just put a placeholder in there for now.

@jsstevenson and @andreasprlic

b0d0nne11 · 2024-12-26T19:36:45Z

Added hgvs.global_config.mapping.ref_at_boundary_is_intronic which defaults to True which I believe is the original behavior. I'm not sure is this name makes sense. I will happily change it if needed. I also looked into adding it as an optional parameter on VariantMapper and AssemblyMapper but it would have to be passed way down through the callstack to where it actually gets used in AltSeqBuilder. Its possible but it would be pretty ugly.

b0d0nne11 · 2024-12-26T19:39:40Z

Also just to note, the automated tests appear to be broken which I don't have time to look into today but the tests do pass locally for me fwiw.

b0d0nne11 · 2024-12-30T15:33:34Z

@gostachowiak just pointed out to me that my change is not quite correct. As written, if ref_at_boundary_is_intronic is set to false the original logic is maintained but that doesn't mean the all refs at the boundary are labeled as intronic. They could be labeled as intronic or exonic depending on which side of the boundary they fall on. In order to make it consistent I need to add logic and test cases to handle the other side of the boundary (the side containing the exon). I'm going to try to accommodate this but it will mean that some variants will be mapped differently regardless of how things are configured.

andreasprlic · 2025-01-21T03:04:50Z

src/hgvs/variantmapper.py


        if len(seq) != seq_end - seq_start:
            # tried to read beyond seq end; this is an out-of-bounds variant
            return var

+        if _type == "g" and mapper and mapper.strand == -1:


Interesting, just out of curiosity, when can the strand be negative for a g.? Inversions?

I'm not really sure. I don't have a simple, science-y explanation for why this is the case. My first pass at this PR didn't include it and we noticed some mappings gave the reverse compliments of the expected sequences. After some more digging we found that only g type variants on the negative strand as described by the alignment mapper were affected.

andreasprlic · 2025-01-21T03:05:31Z

tests/issues/test_655.py

+@pytest.fixture()
+def ref_at_boundary_is_exonic():
+    original = hgvs.global_config.mapping.ref_at_boundary_is_intronic
+    hgvs.global_config.mapping.ref_at_boundary_is_intronic = False


If I am reading this correctly, we are running all tests with this fixture as ref_at_boundary_is_intronic = False, but the default is True. Do we have tests that verify the default behavior as well? Besides that, this all looks good to me. Thank you for the detailed oriented work that is necessary for this PR!

In my most recent changes I updated the tests I added in this PR to cover both cases of the new configuration value. In addition, all of the existing tests also cover the case where ins_at_boundary_is_intronic is True since that's the default value.

gostachowiak · 2025-01-25T14:36:36Z

tests/data/gcp/real.tsv

@@ -50,7 +50,7 @@ ID00048	NC_000010.10:g.89692922T>C	NM_000314.4:c.406T>C	NP_000305.3:p.(Cys136Arg
 ID00049	NC_000010.10:g.89692921dupA	NM_000314.4:c.405dupA	NP_000305.3:p.(Cys136Metfs*44)
 ID00050	NC_000010.10:g.89692923_89692939delGTGCATATTTATTACAT	NM_000314.4:c.407_423delGTGCATATTTATTACAT	NP_000305.3:p.(Cys136Serfs*38)
 ID00051	NC_000010.10:g.89712015C>A	NM_000314.4:c.633C>A	NP_000305.3:p.(Cys211*)
-ID00052	NC_000010.10:g.89685314dupT	NM_000314.4:c.209dupT	NP_000305.3:p.(Cys71Leufs*3)


I verified that this is indeed a duplication of the last base of the exon. Therefore it will now be p.? since the requested default behavior was to have insertions at the exon-intron boundary count as intronic.

b0d0nne11 · 2025-01-27T20:10:31Z

Made a few updates to address Matt's offline feedback I described in my comment on 12/30:

Rebased on main
Added cases for duplications that are adjacent to the intron-exon boundary but are in the exon. This ensures that all insertions on the boundary or duplications adjacent to the boundary are treated consistently regardless of the value of the new configuration variable.
Updated new test coverage to include both values of the new configuration value.
Renamed the new configuration variable global_config.mapping.ins_at_boundary_is_intronic

b0d0nne11 requested a review from a team as a code owner September 15, 2024 17:39

b0d0nne11 mentioned this pull request Sep 15, 2024

Fix variant region for ins and dup on intron-exon boundary #709

Closed

github-actions bot added the stale Issue is stale and subject to automatic closing label Oct 24, 2024

korikuzma removed the stale Issue is stale and subject to automatic closing label Oct 24, 2024

github-actions bot added the stale Issue is stale and subject to automatic closing label Nov 24, 2024

github-actions bot added the closed-by-stale label Dec 1, 2024

github-actions bot closed this Dec 1, 2024

jsstevenson reopened this Dec 2, 2024

jsstevenson removed stale Issue is stale and subject to automatic closing closed-by-stale labels Dec 2, 2024

b0d0nne11 force-pushed the 655-intron-exon-boundaries-take2 branch from a97258c to 2547575 Compare December 26, 2024 19:22

b0d0nne11 mentioned this pull request Jan 14, 2025

Fix ins or dups where splice region is preserved #719

Open

andreasprlic reviewed Jan 21, 2025

View reviewed changes

b0d0nne11 added 3 commits January 21, 2025 17:02

Fix variant region for ins and dup on intron-exon boundary

ac8ffa4

Make variant region at boundary configurable

6081fa7

Update test cache

6cc7a79

b0d0nne11 force-pushed the 655-intron-exon-boundaries-take2 branch from 2547575 to 6cc7a79 Compare January 21, 2025 22:03

Also handle variants that are in exons but adjacent to introns

b311115

b0d0nne11 force-pushed the 655-intron-exon-boundaries-take2 branch from 03f8165 to b311115 Compare January 24, 2025 17:46

Rename ref_at_boundary_is_intronic to ins_at_boundary_is_intronic

ffa29cb

gostachowiak reviewed Jan 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix variant region for ins and dup on intron-exon boundary #748

Fix variant region for ins and dup on intron-exon boundary #748

b0d0nne11 commented Sep 15, 2024

b0d0nne11 commented Sep 15, 2024

andreasprlic commented Sep 16, 2024

b0d0nne11 commented Sep 16, 2024

gostachowiak commented Sep 16, 2024

andreasprlic commented Sep 23, 2024

github-actions bot commented Oct 24, 2024

github-actions bot commented Nov 24, 2024

github-actions bot commented Dec 1, 2024

jsstevenson commented Dec 2, 2024

b0d0nne11 commented Dec 26, 2024

b0d0nne11 commented Dec 26, 2024

b0d0nne11 commented Dec 26, 2024

b0d0nne11 commented Dec 30, 2024

andreasprlic Jan 21, 2025

b0d0nne11 Jan 27, 2025 •

edited

Loading

andreasprlic Jan 21, 2025

b0d0nne11 Jan 27, 2025

gostachowiak Jan 25, 2025

b0d0nne11 commented Jan 27, 2025 •

edited

Loading

Fix variant region for ins and dup on intron-exon boundary #748

Are you sure you want to change the base?

Fix variant region for ins and dup on intron-exon boundary #748

Conversation

b0d0nne11 commented Sep 15, 2024

b0d0nne11 commented Sep 15, 2024

andreasprlic commented Sep 16, 2024

b0d0nne11 commented Sep 16, 2024

gostachowiak commented Sep 16, 2024

andreasprlic commented Sep 23, 2024

github-actions bot commented Oct 24, 2024

github-actions bot commented Nov 24, 2024

github-actions bot commented Dec 1, 2024

jsstevenson commented Dec 2, 2024

b0d0nne11 commented Dec 26, 2024

b0d0nne11 commented Dec 26, 2024

b0d0nne11 commented Dec 26, 2024

b0d0nne11 commented Dec 30, 2024

andreasprlic Jan 21, 2025

Choose a reason for hiding this comment

b0d0nne11 Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

andreasprlic Jan 21, 2025

Choose a reason for hiding this comment

b0d0nne11 Jan 27, 2025

Choose a reason for hiding this comment

gostachowiak Jan 25, 2025

Choose a reason for hiding this comment

b0d0nne11 commented Jan 27, 2025 • edited Loading

b0d0nne11 Jan 27, 2025 •

edited

Loading

b0d0nne11 commented Jan 27, 2025 •

edited

Loading