Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dunder methods to the Python Record class #91

Merged
merged 25 commits into from
Feb 1, 2025

Conversation

apcamargo
Copy link
Contributor

@apcamargo apcamargo commented Dec 19, 2024

This PR introduces several dunder methods to the Python Record class:

  • __new__: Constructor that allows the direct creation of Record objects:
>>> fasta_record = needletail.Record("SRR1749083.HWI-ST1309F", "NGACTCGTC")
>>> fastq_record = needletail.Record("SRR1749083.HWI-ST1309F", "NGACTCGTC", "#<<BBF/FF")
  • __hash__: Returns an integer hash based on the record's ID, sequence, and quality (if available). This overrides the default __hash__, which generates different hashes for identical objects.
  • __eq__: Compares two Record objects by checking if their IDs, sequences, and qualities (if available) are identical. While we could check for equality based on hashes, this would involve additional computation.
  • __len__: Returns the length of the sequence, similar to BioPython's behavior.
  • __str__: Returns a string formatted in FASTA/FASTQ style to facilitate writing FASTX files. Currently, sequences are wrapped at 60 characters for FASTA records, though the wrapping can be adjusted.
  • __repr__: Provides a simple string representation that includes the record ID, a sequence snippet, and the quality string (when available):
>>> needletail.Record("SRR1749083.HWI-ST1309F", "NGACTCGTC")
# Record(id=SRR1749083.HWI-ST1309F, sequence=NGACTCGTC, quality=None)

Additionally, is_fasta and is_fastq have been converted to properties.

Test coverage for these methods has not been added yet, but I will implement it if this PR is approved.

@apcamargo apcamargo changed the title Add dunder methods for the Python Record class Add dunder methods to the Python Record class Dec 19, 2024
src/python.rs Outdated Show resolved Hide resolved
src/python.rs Outdated Show resolved Hide resolved
src/python.rs Show resolved Hide resolved
@audy
Copy link
Contributor

audy commented Jan 22, 2025

@apcamargo thank you very much! I left some comments. Otherwise, these changes look like great additions to the library and we'd be happy to merge them in.

@apcamargo
Copy link
Contributor Author

Thanks for the review, @audy! I just pushed commits that address all your comments.

I prefer wrapping since that's how other libraries/tools format FASTAs, and I think users expect it. That said, I’m not too attached to it :)

@apcamargo apcamargo requested a review from audy January 22, 2025 20:56
@audy
Copy link
Contributor

audy commented Jan 22, 2025

@apcamargo thanks for making those changes. Let me know if you're good to merge.

@apcamargo
Copy link
Contributor Author

apcamargo commented Jan 23, 2025

@audy I’ve added docstrings to all classes, methods, and functions. The only potential change I’d consider is adding a default value to the iupac parameter in both normalize_seq and normalize.

@apcamargo
Copy link
Contributor Author

apcamargo commented Jan 24, 2025

@audy I also updated parse_fastx_file to make it able to take pathlib.Path objects, in addition to str.

@audy
Copy link
Contributor

audy commented Jan 24, 2025

@apcamargo Awesome! Do you mind adding some tests?

The only potential change I’d consider is adding a default value to the iupac parameter in both normalize_seq and normalize.

I support this change but I'm not sure if it should default to True or False. Thoughts?

@apcamargo
Copy link
Contributor Author

Apologies, @audy! I had forgotten about the tests.

I've now added tests for all the new functions and methods, and set the default value of iupac to False.

test_python.py Show resolved Hide resolved
test_python.py Show resolved Hide resolved
Copy link
Contributor

@audy audy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@apcamargo
Copy link
Contributor Author

@audy Can you check if your comments were solved?

@apcamargo
Copy link
Contributor Author

Ah, I see what you mean about test_pathlib_path_input now. The purpose of that method in StrParsingTestCase was to avoid getting it inherited from FileParsingTestCase.

To avoid having an empty method with just pass, I flipped things around and made FileParsingTestCase inherit from StrParsingTestCase instead.

Copy link
Contributor

@audy audy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@apcamargo
Copy link
Contributor Author

Thanks, @audy!

@apcamargo
Copy link
Contributor Author

@audy, I have a few questions:

  • What do you think about modifying the normalize method to return a new Record object instead of only applying in-place modifications? I could introduce an inplace parameter to control this behavior. Personally, I prefer setting inplace=False by default, but if maintaining compatibility is a priority, we could default it to True.
  • Would it be useful to add a property that returns a list of integers representing quality scores? Alternatively, it could be a method with a parameter to specify the encoding base (64 or 33). I can include this change in this PR or address it in a separate one after this is merged.

Let me know your thoughts!

@audy
Copy link
Contributor

audy commented Jan 30, 2025

What do you think about modifying the normalize method to return a new Record object instead of only applying in-place modifications?

This seems more Pythonic but doesn't match the Rust implementation (it might be clearer for the Rust and Python methods to have the same behavior). So I would maybe add a second method called .to_normalized_record() or similar.

Would it be useful to add a property that returns a list of integers representing quality scores? Alternatively, it could be a method with a parameter to specify the encoding base (64 or 33).

I think that would be useful.

I can include this change in this PR or address it in a separate one after this is merged.

I think this PR is good to go and both of these items can be addressed as separate PRs.

@apcamargo
Copy link
Contributor Author

Agreed. I can work on that once this is merged.

Also, would you consider making a release once this is merged? I'm working on an API that uses needletail and it would be nice to have these features there.

No worries if you plan to have releases only when the Rust library gets updated

@audy
Copy link
Contributor

audy commented Feb 1, 2025

@apcamargo I can push a release to pypi. May I go ahead and merge?

@apcamargo
Copy link
Contributor Author

Sure! Thank you!

@audy audy merged commit 0010f64 into onecodex:master Feb 1, 2025
7 checks passed
@apcamargo
Copy link
Contributor Author

Thanks, @audy!

Just one request: Could you also upload the source to PyPI? I’d like to update the Bioconda recipe to fetch the library from PyPI, and the source code is required for building in Bioconda's CI. I think you just need to include the --sdist flag when calling maturin build.

@apcamargo apcamargo deleted the dunder-methods branch February 14, 2025 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants