Add manual seq type #478

t0mdavid-m · 2023-06-18T11:56:45Z

This PR implements the seq_type parameter as described in #477.

padix-key

Thank you, that was quite quick! Overall your solution looks very good to me, I would only like to request some minor changes. What do you think?

padix-key · 2023-06-18T12:18:31Z

src/biotite/sequence/io/fasta/convert.py

@@ -16,7 +16,7 @@
           "get_alignment", "set_alignment"]


-def get_sequence(fasta_file, header=None):
+def get_sequence(fasta_file, seq_type=None, header=None):


I think for backwards compatibility reasons the new parameter should be the last parameter in the list, e.g. in case someone uses the header as positional argument in their code. Although Biotite is still in 0.xversion, where compatibility is not strictly required, I would prefer the more compatible way, if there is no clear advantage of the other option.

I placed the arguments based on gut feeling. However, I agree that compatibility should take priority here.

src/biotite/sequence/io/fasta/convert.py

padix-key · 2023-06-18T12:21:09Z

src/biotite/sequence/io/fasta/convert.py

+            seq_str = process_protein_sequence(seq_str)
+        # Return the converted sequence
+        return seq_type(seq_str)    
+
    # Biotite alphabets for nucleotide and proteins


This part of the comment is also obsolete with the code changes.

Suggested change

# Biotite alphabets for nucleotide and proteins

I went over the comments and removed all remainders of the previous implementation. I also added/moved a comment describing the purpose of each lambda function.

tests/sequence/test_fasta.py

padix-key · 2023-06-18T12:24:36Z

tests/sequence/test_fasta.py

+    assert seq.NucleotideSequence("ACGCTACGT") == fasta.get_sequence(
+        file, seq.NucleotideSequence
+    )


Same as above

I also refactored the test by splitting it into three functionally different tests:

Ambiguous Sequences (i.e. nucleotide sequences which can also be loaded as amino acid sequences)

Protein Sequences (should fail when loaded as nucleotide sequences)

Invalid Sequences (should fail in all cases)

tests/sequence/test_fasta.py

padix-key · 2023-06-18T12:30:45Z

I think we can ignore the error in biotite.databasefor this PR, there seems to be some change in the RCSB and PubChem database, that can be addressed another time.

t0mdavid-m · 2023-06-19T17:12:45Z

Thank you for your review! I applied all suggestions. From my side, the PR is ready to be merged.

t0mdavid-m added 4 commits June 18, 2023 13:21

add parameter seq_type

3fefaa8

add new parameter to tests

8fcfca1

fix conversion

680c88d

refactor _convert_to_sequence()

6e4921e

t0mdavid-m requested a review from padix-key June 18, 2023 11:56

padix-key reviewed Jun 18, 2023

View reviewed changes

t0mdavid-m added 6 commits June 18, 2023 16:25

explicitely explain automatic sequence detection

628c5cb

fix comments

bc7ffd0

parametrize test_access_high_level()

39a6579

change argument order

3d25d5c

explicitely set seq_type

47fcae1

refactor test_sequence_conversion()

a3f1174

padix-key merged commit dc72831 into biotite-dev:master Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual seq type #478

Add manual seq type #478

t0mdavid-m commented Jun 18, 2023

padix-key left a comment

padix-key Jun 18, 2023

t0mdavid-m Jun 19, 2023

padix-key Jun 18, 2023

t0mdavid-m Jun 19, 2023

padix-key Jun 18, 2023

t0mdavid-m Jun 19, 2023

padix-key commented Jun 18, 2023

t0mdavid-m commented Jun 19, 2023 •

edited

Loading

Add manual seq type #478

Add manual seq type #478

Conversation

t0mdavid-m commented Jun 18, 2023

padix-key left a comment

Choose a reason for hiding this comment

padix-key Jun 18, 2023

Choose a reason for hiding this comment

t0mdavid-m Jun 19, 2023

Choose a reason for hiding this comment

padix-key Jun 18, 2023

Choose a reason for hiding this comment

t0mdavid-m Jun 19, 2023

Choose a reason for hiding this comment

padix-key Jun 18, 2023

Choose a reason for hiding this comment

t0mdavid-m Jun 19, 2023

Choose a reason for hiding this comment

padix-key commented Jun 18, 2023

t0mdavid-m commented Jun 19, 2023 • edited Loading

t0mdavid-m commented Jun 19, 2023 •

edited

Loading