Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Track config options, mimick sequence length clamping as implicitly done in hgvs py #206

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tedil
Copy link
Contributor

@tedil tedil commented Oct 4, 2024

  • In biocommons hgvs, del seq[start:end] will implicitly clamp end to len(seq), so we mimick that here. However, this needs more investigation in the future.
  • Shuffle direction and window size options could not be configured further up the hierarchy chain, now they can.

Summary by CodeRabbit

  • New Features

    • Expanded configuration options across data processing steps, offering adjustable settings for directional behavior and window size.
    • Made key normalization functionality accessible for improved interaction.
  • Bug Fixes

    • Improved safety checks for sequence modifications by ensuring operations remain within valid boundaries and logging warnings when adjustments occur.

@tedil tedil force-pushed the track-config-options branch from df98033 to 25376fb Compare February 12, 2025 16:06
Copy link

coderabbitai bot commented Feb 12, 2025

Walkthrough

The changes implement additional safety checks and configuration options across multiple modules. A new conditional block in sequence-alteration methods logs a warning and clamps the altered range to avoid out-of-bounds errors. The configuration structs in several modules now include new fields (shuffle_direction and window_size) with updated default initializations. Additionally, the visibility of the normalization method in one module is updated, and the Direction enum has been enhanced to derive a default value.

Changes

Files Change Summary
src/mapper/altseq.rs, src/mapper/variant.rs Added range checks with logging in methods (incorporate_delins and get_altered_sequence) to clamp altered sequence ranges when indices exceed sequence length.
src/mapper/assembly.rs, src/mapper/variant.rs, src/validator/mod.rs Extended Config structs by adding shuffle_direction and window_size fields with updated default values; updated maybe_normalize in assembly to public for external access.
src/normalizer.rs Updated the Direction enum to derive Default (defaulting to FiveToThree) and modified Config initialization to use Default::default().

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant AltSeqBuilder
    participant Sequence
    Caller->>AltSeqBuilder: call incorporate_delins(seq, range)
    AltSeqBuilder->>AltSeqBuilder: Check if range end > seq.len()
    alt Out-of-bound condition met
        AltSeqBuilder->>AltSeqBuilder: Log warning and clamp range
    end
    AltSeqBuilder->>Sequence: call replace_range(valid_range)
    Sequence-->>AltSeqBuilder: Updated sequence
    AltSeqBuilder-->>Caller: Return updated sequence
Loading
sequenceDiagram
    participant Mapper
    participant Normalizer
    Mapper->>Mapper: Prepare normalizer config (incl. shuffle_direction, window_size)
    Mapper->>Normalizer: Initialize Normalizer with config
    Normalizer-->>Mapper: Return Normalizer instance
Loading

Possibly related PRs

Suggested reviewers

  • holtgrewe

Poem

I'm just a rabbit with paws so nimble,
Clamping ranges with a cheerful symbol.
New config fields hop into view,
Logging warnings as the sequence grew.
With every bound now safely kept—
Code and carrots in harmony are adept!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 305058d and 25376fb.

📒 Files selected for processing (5)
  • src/mapper/altseq.rs (1 hunks)
  • src/mapper/assembly.rs (5 hunks)
  • src/mapper/variant.rs (6 hunks)
  • src/normalizer.rs (2 hunks)
  • src/validator/mod.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Testing (full)
  • GitHub Check: Testing (fast)
🔇 Additional comments (7)
src/validator/mod.rs (1)

137-138: LGTM! Configuration fields added correctly.

The new configuration fields shuffle_direction and window_size are properly initialized with appropriate default values, enhancing the configurability of the validator.

src/mapper/assembly.rs (1)

51-52: LGTM! Configuration fields consistently added across structs.

The new configuration fields shuffle_direction and window_size are properly added and initialized in both the Config struct and the Mapper::new() method, maintaining consistency with the validator implementation.

Also applies to: 69-70, 119-120

src/mapper/altseq.rs (1)

452-462: LGTM! Sequence range clamping implemented safely.

The new conditional block properly handles cases where the altered sequence range exceeds the sequence length by:

  1. Logging a descriptive warning message with relevant context
  2. Safely clamping the range to the sequence length
  3. Preventing potential out-of-bounds access

This change successfully mimics the implicit sequence length clamping behavior of the biocommons hgvs library.

src/normalizer.rs (1)

55-60: LGTM! Default direction properly configured.

The changes effectively:

  1. Add #[default] attribute to FiveToThree variant, making it the default direction
  2. Update Config to use Default::default() for shuffle_direction

This leverages Rust's default trait capabilities and maintains consistency with the configuration changes across the codebase.

Also applies to: 81-81

src/mapper/variant.rs (3)

10-11: LGTM! Configuration changes look good.

The implementation properly adds and initializes new configuration options for shuffle direction and window size, with appropriate default values and proper propagation to the normalizer.

Also applies to: 40-41, 54-55, 158-159


790-800: LGTM! Sequence length clamping implementation looks good.

The implementation safely handles out-of-bounds sequence ranges by clamping to sequence length and logging appropriate warnings. This matches the Python implementation's behavior.


808-812: Verify clamping behavior with additional test cases.

The FIXME comment indicates uncertainty about whether the implicit clamping behavior for deletions is intended. Consider adding test cases to validate this behavior.

Run the following script to find existing test cases and identify gaps in coverage:

❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Description: Search for test cases related to sequence length clamping.

# Test: Search for test cases involving sequence length clamping.
rg -A 5 'get_altered_sequence|del seq\[' tests/

Length of output: 49


ACTION REQUIRED: Add Test Cases to Validate Clamping Behavior

The search for tests involving sequence deletion clamping (e.g., using seq.replace_range) found no matches in the tests folder. This indicates that there are currently no test cases verifying the behavior when the deletion range exceeds the sequence length. Please add tests to ensure that this implicit clamping behavior behaves as intended.

  • No tests referencing clamping (e.g., del seq[...] or seq.replace_range) were found.
  • Consider adding cases where pos_end > len(seq) to confirm that the deletion does not error and clamps as expected.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.42%. Comparing base (305058d) to head (25376fb).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #206      +/-   ##
==========================================
+ Coverage   92.40%   92.42%   +0.01%     
==========================================
  Files          18       18              
  Lines       11455    11481      +26     
==========================================
+ Hits        10585    10611      +26     
  Misses        870      870              
Files with missing lines Coverage Δ
src/mapper/altseq.rs 95.11% <100.00%> (+0.04%) ⬆️
src/mapper/assembly.rs 72.52% <100.00%> (+0.35%) ⬆️
src/mapper/variant.rs 85.43% <100.00%> (+0.19%) ⬆️
src/normalizer.rs 91.19% <100.00%> (ø)
src/validator/mod.rs 72.64% <100.00%> (+0.52%) ⬆️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant