Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AuxAlignment for parsing secondary/supplementary alignments in SAM tags #209

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

clintval
Copy link
Member

@clintval clintval commented Jan 10, 2025

Prior drafts are here and here.

  • Add a few more unit tests to get back up to 100% coverage

@clintval clintval requested review from nh13 and tfenne as code owners January 10, 2025 16:56
Copy link
Contributor

coderabbitai bot commented Jan 10, 2025

Walkthrough

The pull request introduces significant changes to the SAM module by replacing the SupplementaryAlignment class with the new AuxAlignment class, which manages auxiliary alignments from SAM tags such as SA, XA, and XB. The AuxAlignment class includes attributes for various alignment metrics and validation checks. The deprecated SupplementaryAlignment class has had its parsing functionality transferred to AuxAlignment, and related examples have been updated.

New methods for creating auxiliary alignments from primary alignments have been added, along with a constant to indicate missing query bases. Comprehensive unit tests for the AuxAlignment class focus on input validation and parsing of auxiliary alignment tags. Additionally, tests for the deprecated SupplementaryAlignment class have been added, enhancing overall test coverage.

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/fgpyo/sam/test_supplementary_alignments.py (1)

Line range hint 8-24: Evaluate the necessity of tests for deprecated SupplementaryAlignment.

Since SupplementaryAlignment is deprecated, consider focusing on testing the new AuxAlignment class. Maintaining tests for a deprecated class might not be essential unless required for backward compatibility.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0b4227 and e0700de.

📒 Files selected for processing (3)
  • fgpyo/sam/__init__.py (5 hunks)
  • tests/fgpyo/sam/test_aux_alignment.py (1 hunks)
  • tests/fgpyo/sam/test_supplementary_alignments.py (4 hunks)
🔇 Additional comments (2)
fgpyo/sam/__init__.py (1)

1255-1528: Implementation of AuxAlignment is solid.

The AuxAlignment class is well-designed, effectively replacing SupplementaryAlignment with enhanced functionality for parsing auxiliary alignments from SAM tags. The parsing methods and validation checks are thorough.

tests/fgpyo/sam/test_aux_alignment.py (1)

1-612: Comprehensive test coverage for AuxAlignment.

The test suite thoroughly validates input parsing, exception handling, and key functionalities of AuxAlignment, ensuring robustness against various edge cases.

fgpyo/sam/__init__.py Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.52%. Comparing base (b0b4227) to head (7055cb5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #209      +/-   ##
==========================================
+ Coverage   91.06%   91.52%   +0.46%     
==========================================
  Files          18       18              
  Lines        2283     2396     +113     
  Branches      337      354      +17     
==========================================
+ Hits         2079     2193     +114     
  Misses        133      133              
+ Partials       71       70       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@tfenne tfenne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction!

fgpyo/sam/__init__.py Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after @tfenne's comments look good and you're happy with it

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/fgpyo/sam/test_template_iterator.py (1)

225-241: Test looks good but could use more edge cases.

Test follows AAA pattern and verifies core functionality.

Consider adding test cases for:

  • Empty/malformed tag values
  • Multiple auxiliary alignments
  • Hard-clipped alignments
fgpyo/sam/__init__.py (2)

142-164: Add more comprehensive examples.

Current examples show basic usage. Consider adding:

  • Multiple auxiliary alignment parsing
  • Error handling examples
  • Template integration examples

1482-1486: Improve hard-clipping sequence handling.

Current implementation sets sequence to "*" for any hard-clipping. Consider implementing the TODO to properly pad/clip sequences.

Would you like me to propose an implementation for proper sequence padding/clipping?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0700de and f8d56d0.

📒 Files selected for processing (3)
  • fgpyo/sam/__init__.py (6 hunks)
  • tests/fgpyo/sam/test_aux_alignment.py (1 hunks)
  • tests/fgpyo/sam/test_template_iterator.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/fgpyo/sam/test_aux_alignment.py
🧰 Additional context used
🪛 GitHub Actions: Python package
fgpyo/sam/__init__.py

[error] 1381-1381: f-string: expecting '}' - Syntax error in f-string formatting

🔇 Additional comments (3)
fgpyo/sam/__init__.py (3)

Line range hint 172-214: Constants and imports look good.


1261-1537: Well-structured class implementation.

The AuxAlignment class is well-documented and implements comprehensive validation. Good separation of concerns between parsing and SAM record generation.

🧰 Tools
🪛 GitHub Actions: Python package

[error] 1381-1381: f-string: expecting '}' - Syntax error in f-string formatting


1348-1350: Resolve alignment score validation TODO.

The commented validation for alignment scores should be resolved.

Let's check BWA's behavior with negative alignment scores:

fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
fgpyo/sam/__init__.py (5)

142-164: Add more comprehensive examples.

Add examples showing:

  • Error cases and exception handling
  • Multiple auxiliary alignments in a single tag
  • Usage with Template class

1225-1230: Add usage examples in docstring.

Add examples showing:

  • How the method affects the template structure
  • Common use cases

1339-1353: Add more validations in post_init.

Add validation for:

  • reference_name format (non-empty, valid characters)
  • cigar string format

1482-1487: Implement TODO for handling hard clips.

The current implementation sets bases/quals to "*" when hard clips exist. Consider implementing the suggested padding/clipping logic.

Would you like me to help implement the sequence padding/clipping logic?


1519-1523: Consider making RG tag mandatory.

The Read Group (RG) tag is optional but highly recommended. Consider requiring it for consistency with SAM spec.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8d56d0 and e7f2ae2.

📒 Files selected for processing (1)
  • fgpyo/sam/__init__.py (6 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py

[warning] 1376-1376: fgpyo/sam/init.py#L1376
Added line #L1376 was not covered by tests


[warning] 1387-1387: fgpyo/sam/init.py#L1387
Added line #L1387 was not covered by tests


[warning] 1437-1437: fgpyo/sam/init.py#L1437
Added line #L1437 was not covered by tests


[warning] 1469-1469: fgpyo/sam/init.py#L1469
Added line #L1469 was not covered by tests

🔇 Additional comments (2)
fgpyo/sam/__init__.py (2)

213-214: LGTM!

The constant follows the module's naming convention and is well-documented.


1348-1350: Resolve TODO comment about alignment scores.

The comment about negative alignment scores needs resolution. Either:

  • Add validation
  • Document why negative scores are allowed

fgpyo/sam/__init__.py Show resolved Hide resolved
@clintval clintval requested a review from tfenne January 10, 2025 20:48
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
fgpyo/sam/__init__.py (1)

142-164: Add error case examples to documentation.

Add examples showing:

  • Invalid tag values
  • Multi-value parsing
  • Error handling
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7f2ae2 and da497e7.

📒 Files selected for processing (3)
  • fgpyo/sam/__init__.py (6 hunks)
  • tests/fgpyo/sam/test_aux_alignment.py (1 hunks)
  • tests/fgpyo/sam/test_template_iterator.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/fgpyo/sam/test_template_iterator.py
  • tests/fgpyo/sam/test_aux_alignment.py
🔇 Additional comments (4)
fgpyo/sam/__init__.py (4)

213-215: LGTM!

Well-documented constant following established patterns.


1225-1230: LGTM!

Clean implementation with proper null handling and efficient iterator usage.


1348-1350: Resolve TODO comment about alignment score validation.

The comment about alignment score validation needs resolution. If BWA allows negative scores, document this. If not, implement the validation.


1433-1438: Add test coverage for error paths.

Missing test coverage for:

  • Empty tag values
  • Invalid tag values
  • Multi-value string parsing errors
✅ Verification successful

Error paths are well tested.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check test coverage for error paths
fd "test_.*\.py" tests/fgpyo/sam/ -x rg -l "def test.*error" {}

Length of output: 135


Script:

#!/bin/bash
# Check specific error test cases
echo "=== Test cases for aux alignment errors ==="
rg "def test.*error" -A 5 tests/fgpyo/sam/test_aux_alignment.py

echo -e "\n=== Implementation of from_tag_value ==="
ast-grep --pattern 'def from_tag_value($$$) {$$$}'

echo -e "\n=== Test cases with empty or invalid values ==="
rg "empty|invalid|malformed" tests/fgpyo/sam/test_aux_alignment.py

Length of output: 1685

fgpyo/sam/__init__.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
tests/fgpyo/sam/test_aux_alignment.py (4)

13-58: Add test cases for alignment_score validation and None handling.

Add test cases for:

  • Negative alignment_score validation
  • None values in optional fields (mapping_quality, edit_distance, alignment_score)

60-164: Add test cases for cigar string validation.

Add test cases for:

  • Empty cigar string ("")
  • Invalid cigar string ("50X")

166-215: Add test cases for invalid field formats.

Add test cases for:

  • Invalid mapping quality format ("abc")
  • Invalid edit distance format ("xyz")

589-603: Track hard-clipping sequence handling enhancement.

The TODO about padding/clipping sequence with hard clips should be tracked in an issue.

Want me to create a GitHub issue to track this enhancement?

fgpyo/sam/__init__.py (3)

1261-1350: Add validation for alignment_score.

Add validation to ensure alignment_score is not negative when provided.

 def __post_init__(self) -> None:
     """Perform post-initialization validation on this dataclass."""
     errors: list[str] = []
     if self.reference_start < 0:
         errors.append(f"Reference start cannot be less than 0! Found: {self.reference_start}")
     if self.mapping_quality is not None and self.mapping_quality < 0:
         errors.append(f"Mapping quality cannot be less than 0! Found: {self.mapping_quality}")
     if self.edit_distance is not None and self.edit_distance < 0:
         errors.append(f"Edit distance cannot be less than 0! Found: {self.edit_distance}")
+    if self.alignment_score is not None and self.alignment_score < 0:
+        errors.append(f"Alignment score cannot be less than 0! Found: {self.alignment_score}")
     if len(errors) > 0:
         raise ValueError("\n".join(errors))

1356-1422: Refactor tag parsing into separate methods.

Split parsing logic into tag-specific methods for better maintainability:

  • _parse_sa_tag
  • _parse_xa_tag
  • _parse_xb_tag

1423-1438: Add type hint for values list.

Add explicit type hint for clarity:

-            values: list[str] = cast(str, rec.get_tag(tag)).rstrip(";").split(";")
+            values: list[str] = cast(str, rec.get_tag(tag)).rstrip(";").split(";") 
+            aux_values: list[str] = [v for v in values if v]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da497e7 and 7055cb5.

📒 Files selected for processing (2)
  • fgpyo/sam/__init__.py (6 hunks)
  • tests/fgpyo/sam/test_aux_alignment.py (1 hunks)
🔇 Additional comments (1)
fgpyo/sam/__init__.py (1)

1225-1230: LGTM!

Clean implementation of template reconstruction with auxiliary alignments.

fgpyo/sam/__init__.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants