-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add manual seq type #478
Add manual seq type #478
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that was quite quick! Overall your solution looks very good to me, I would only like to request some minor changes. What do you think?
@@ -16,7 +16,7 @@ | |||
"get_alignment", "set_alignment"] | |||
|
|||
|
|||
def get_sequence(fasta_file, header=None): | |||
def get_sequence(fasta_file, seq_type=None, header=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for backwards compatibility reasons the new parameter should be the last parameter in the list, e.g. in case someone uses the header
as positional argument in their code. Although Biotite is still in 0.x
version, where compatibility is not strictly required, I would prefer the more compatible way, if there is no clear advantage of the other option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I placed the arguments based on gut feeling. However, I agree that compatibility should take priority here.
seq_str = process_protein_sequence(seq_str) | ||
# Return the converted sequence | ||
return seq_type(seq_str) | ||
|
||
# Biotite alphabets for nucleotide and proteins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the comment is also obsolete with the code changes.
# Biotite alphabets for nucleotide and proteins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over the comments and removed all remainders of the previous implementation. I also added/moved a comment describing the purpose of each lambda function.
tests/sequence/test_fasta.py
Outdated
assert seq.NucleotideSequence("ACGCTACGT") == fasta.get_sequence( | ||
file, seq.NucleotideSequence | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also refactored the test by splitting it into three functionally different tests:
- Ambiguous Sequences (i.e. nucleotide sequences which can also be loaded as amino acid sequences)
- Protein Sequences (should fail when loaded as nucleotide sequences)
- Invalid Sequences (should fail in all cases)
I think we can ignore the error in |
Thank you for your review! I applied all suggestions. From my side, the PR is ready to be merged. |
This PR implements the
seq_type
parameter as described in #477.