-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function for determining read pair orientation #201
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #201 +/- ##
==========================================
+ Coverage 90.32% 91.31% +0.99%
==========================================
Files 18 18
Lines 2232 2281 +49
Branches 322 334 +12
==========================================
+ Hits 2016 2083 +67
+ Misses 144 126 -18
Partials 72 72 ☔ View full report in Codecov by Sentry. |
WalkthroughThe pull request introduces a new enumeration Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
fgpyo/sam/__init__.py (2)
669-712
: Consider addressing the TODO for insert size consistency.
You are already noting in the comments that usingabs(isize(r1, r2))
might be more consistent than relying ontemplate_length
. It might help ensure correctness whenset_pair_info
hasn’t been invoked but the read pair is in FR orientation.
814-815
: Add test coverage for different query names.
Currently, raising aValueError
ifr1.query_name != r2.query_name
is untested according to static analysis. Add a targeted test to ensure this error path is covered.Would you like me to propose a new unit test to ensure coverage?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 815-815: fgpyo/sam/init.py#L815
Added line #L815 was not covered by tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py
(4 hunks)tests/fgpyo/sam/test_sam.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 815-815: fgpyo/sam/init.py#L815
Added line #L815 was not covered by tests
🔇 Additional comments (4)
fgpyo/sam/__init__.py (3)
268-269
: Logic looks consistent.
The approach to auto-detect the file type when it is None
is clear and aligns with your design of delegating to SamFileType.from_path()
.
610-667
: Orientation logic is clear and closely follows established conventions.
The PairOrientation
enum and its build()
method handle forward/reverse logic well, including fallback to None
for unmapped or cross-chromosomal reads. The checks are clear and match typical orientation definitions.
830-832
: Ensure MC
and MQ
tags are valid when copying from an unmapped mate.
When one read is unmapped, its cigar might be None
. It’s good practice to confirm that src.cigarstring
is valid before setting MC
. Otherwise, consider skipping it or clearing the tag on the mate.
tests/fgpyo/sam/test_sam.py (1)
20-22
: Imports look good.
Importing PairOrientation
and is_properly_paired
ensures that the new functionality is fully tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fgpyo/sam/__init__.py (1)
669-711
: Good logic inis_proper_pair
.
Consider lettingisize(r1, r2)
accept an optionalr2
to avoid usingtemplate_length
directly if you want more consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py
(4 hunks)tests/fgpyo/sam/test_sam.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 815-815: fgpyo/sam/init.py#L815
Added line #L815 was not covered by tests
🔇 Additional comments (15)
tests/fgpyo/sam/test_sam.py (13)
20-20
: Nice import.
22-22
: Good addition to test coverage.
305-342
: Solid test case for pair orientation with R2.
344-361
: Good check for unmapped reads.
363-386
: Broad coverage with missing R2.
388-408
: Comprehensive scenario for unmapped reads without explicit R2.
410-419
: Exception testing is well done.
421-432
: Verifies proper pairing in typical alignment.
434-445
: Checks proper pairing with only R1.
447-467
: Correctly confirms improper pair status for mismatched orientation.
469-489
: Extensions of the orientation checks with no R2.
491-497
: Properly validates pairs that exceed distance limits.
Line range hint 814-815
: Add a negative test for query-name mismatch.
The existing coverage is good, but there’s no scenario testing a mismatch in query names. Please add a test that calls set_pair_info
with two different query names to confirm the ValueError
is raised.
fgpyo/sam/__init__.py (2)
610-667
: Clean enumeration for pair orientation.
830-832
: Mate info tags updated properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved so long as you agree with my comments, otherwise let's discuss any differences!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fgpyo/sam/__init__.py (2)
624-668
: Consider renaming build() method.Previous discussion suggested alternatives to
build()
. Considerfrom_recs()
for clarity.The implementation is solid:
- Handles missing R2 case
- Validates MC tag presence
- Follows HTSJDK logic
674-712
: Consider implementing TODO for isize().Function is well-implemented but could be improved:
- Replace template_length check with isize() call
- Would need isize() modification to handle optional R2
Want me to implement the isize() modification?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fgpyo/sam/__init__.py
(5 hunks)
🔇 Additional comments (4)
fgpyo/sam/__init__.py (4)
161-161
: LGTM: Import needed for type hints.
611-622
: LGTM: Well-documented enum for pair orientations.
Clear documentation of FR ("innie"), RF ("outie"), and TANDEM orientations.
670-671
: LGTM: Clear default for proper pair orientation.
815-816
: LGTM: Improved error handling and tag management.
- Better error handling with ValueError
- Added mate_is_unmapped property
- Added MQ tag for mapping quality
Also applies to: 831-833
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
fgpyo/sam/__init__.py (1)
611-676
: Enum logic is robust but somewhat dense.You might consider splitting out orientation checks for clarity. It’s only a suggestion; the flow is already correct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py
(5 hunks)tests/fgpyo/sam/test_sam.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 824-824: fgpyo/sam/init.py#L824
Added line #L824 was not covered by tests
🔇 Additional comments (18)
tests/fgpyo/sam/test_sam.py (14)
20-20
: Import confirmation.
Thanks for importing PairOrientation
. Looks good.
22-22
: Separate import is clean.
Bringing in is_proper_pair
is also fine. Nice separation of concerns.
305-336
: Comprehensive test coverage for R1/R2 orientation.
The multiple orientation checks (FR, RF, TANDEM) offer solid confidence in correctness.
338-354
: Overlapping reads scenario well-tested.
Tests confirm opposite directions produce an FR orientation. Good job.
356-363
: Edge case for single base alignment.
This ensures minimal overlap is still recognized as FR. Nicely done.
365-388
: Unmapped cases properly handled.
Verifies that None
is returned when either read is unmapped. This is thorough.
390-413
: Partial pair tests for R1 only.
This confirms orientation derivation from a single read plus embedded mate info. Useful coverage.
415-429
: Non-existent MC tag leads to exception.
Excellent negative test verifying an exception when MC tag is missing. Good boundary check.
431-446
: Ensures mate cigar tag is required for RF orientation.
You covered the raising of an error if MC is missing again. Nicely consistent.
448-459
: Proving proper pairing for FR.
Testing the “retrieved” orientation correctness. Nice demonstration of actual real pairing logic.
461-472
: Proper pairing check without explicit R2 reference.
Another valid scenario ensuring coverage. Straightforward and clear.
474-494
: Invalid orientation test.
Good to see you confirm how incorrectly oriented reads fail is_proper_pair
. No concerns here.
496-516
: Orientation test with missing second read reference.
Same negative scenario for a single read. Great consistency.
518-524
: Distance-based negative test.
Checks that large insert sizes violate “proper pair” status. This completes the coverage nicely.
fgpyo/sam/__init__.py (4)
161-161
: New import for type hinting.
Using Collection
clarifies the function signatures. Minimal overhead.
269-270
: File type inference logic is concise.
Autodetection for read paths is tidy and remains consistent with the rest of the method’s flow.
678-720
: Proper pair criteria well-defined.
Your authority checks and orientation gating are in line with standard SAM specs. No issues here.
Line range hint 823-841
: Missing test coverage for different query names exception.
A separate test for the ValueError
would boost coverage for line 824.
This was pointed out previously as well.
@tfenne if you have time to re-review:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Tim Fennell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
fgpyo/sam/__init__.py (1)
Test coverage exists, but not for query name validation
While set_pair_info is well tested for orientation, proper pairing, and mate info, the new query name validation is untested.
- Add test in
tests/fgpyo/sam/test_sam.py
to verify ValueError is raised when query names differ🔗 Analysis chain
Line range hint
823-841
: Add test coverage for query name validation.Line 824 raising ValueError for mismatched query names lacks test coverage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check existing test coverage rg -B 2 -A 2 "set_pair_info.*different.*names" tests/fgpyo/sam/test_sam.pyLength of output: 76
Script:
#!/bin/bash # Search for test cases related to set_pair_info rg -B 3 -A 3 "set_pair_info" tests/fgpyo/sam/test_sam.py # Find all test files that might contain relevant tests fd "test.*\.py" tests/fgpyo/sam/ # Check implementation details of set_pair_info rg -B 3 -A 3 "def set_pair_info" fgpyo/sam/Length of output: 5747
🧹 Nitpick comments (1)
fgpyo/sam/__init__.py (1)
682-720
: Address TODO for insert size calculation.The TODO suggests replacing template_length check with isize() function.
Would you like me to help implement the suggested isize() modification to handle optional R2?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fgpyo/sam/__init__.py
(4 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 824-824: fgpyo/sam/init.py#L824
Added line #L824 was not covered by tests
🔇 Additional comments (4)
fgpyo/sam/__init__.py (4)
161-161
: LGTM!
The Collection import is appropriately used for type hinting.
611-622
: LGTM!
The enum clearly defines the three possible pair orientations with descriptive docstrings.
678-679
: LGTM!
FR as default proper pair orientation aligns with Illumina standards.
624-676
: Verify edge cases in from_recs implementation.
The implementation handles both single record and paired record cases correctly. Let's verify the edge cases:
✅ Verification successful
Let me check more test cases to verify the implementation thoroughly.
Edge cases in from_recs are properly handled
Tests confirm correct handling of:
- Unmapped reads (both/either)
- Missing R2 with mapped mate
- Overlapping reads
- FR/RF orientations
- Missing mate CIGAR tag
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for tests covering edge cases
rg -A 5 "from_recs.*None" tests/fgpyo/sam/test_sam.py
Length of output: 911
Script:
#!/bin/bash
# Search for more test cases related to pair orientation
rg -A 10 "test_pair_orientation" tests/fgpyo/sam/test_sam.py
Length of output: 3391
I have a project where I am re-building a complex alignment situation for a template and it involves an assessment of read pair orientation and whether the new pair is "proper" or not.
The functions in this PR allow for an optional R2 such as when you are iterating through a BAM in coordinate order and don't have easy access to the R2. When this happens, R1 is required to have enough mate information to derive the pair status and whether or not the read pair is "proper".
The implementation honors what is implemented in HTSJDK/fgbio but with:
None
instead of an exception, when possible