Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manual seq type #478

Merged
merged 10 commits into from
Jun 19, 2023
Merged

Conversation

t0mdavid-m
Copy link
Member

This PR implements the seq_type parameter as described in #477.

@t0mdavid-m t0mdavid-m requested a review from padix-key June 18, 2023 11:56
Copy link
Member

@padix-key padix-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that was quite quick! Overall your solution looks very good to me, I would only like to request some minor changes. What do you think?

@@ -16,7 +16,7 @@
"get_alignment", "set_alignment"]


def get_sequence(fasta_file, header=None):
def get_sequence(fasta_file, seq_type=None, header=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for backwards compatibility reasons the new parameter should be the last parameter in the list, e.g. in case someone uses the header as positional argument in their code. Although Biotite is still in 0.xversion, where compatibility is not strictly required, I would prefer the more compatible way, if there is no clear advantage of the other option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed the arguments based on gut feeling. However, I agree that compatibility should take priority here.

src/biotite/sequence/io/fasta/convert.py Outdated Show resolved Hide resolved
seq_str = process_protein_sequence(seq_str)
# Return the converted sequence
return seq_type(seq_str)

# Biotite alphabets for nucleotide and proteins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the comment is also obsolete with the code changes.

Suggested change
# Biotite alphabets for nucleotide and proteins

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the comments and removed all remainders of the previous implementation. I also added/moved a comment describing the purpose of each lambda function.

tests/sequence/test_fasta.py Outdated Show resolved Hide resolved
Comment on lines 62 to 64
assert seq.NucleotideSequence("ACGCTACGT") == fasta.get_sequence(
file, seq.NucleotideSequence
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also refactored the test by splitting it into three functionally different tests:

  • Ambiguous Sequences (i.e. nucleotide sequences which can also be loaded as amino acid sequences)
  • Protein Sequences (should fail when loaded as nucleotide sequences)
  • Invalid Sequences (should fail in all cases)

tests/sequence/test_fasta.py Outdated Show resolved Hide resolved
@padix-key
Copy link
Member

I think we can ignore the error in biotite.databasefor this PR, there seems to be some change in the RCSB and PubChem database, that can be addressed another time.

@t0mdavid-m
Copy link
Member Author

t0mdavid-m commented Jun 19, 2023

Thank you for your review! I applied all suggestions. From my side, the PR is ready to be merged.

@padix-key padix-key merged commit dc72831 into biotite-dev:master Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants