-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMPlify ignores sequences containing stop codon indicator #17
Comments
Thank you for your message. We understand how the inclusion of stop codon indicators (such as *) in sequence outputs from annotation tools like Prodigal, Prokka, and Bakta can cause issues when used with AMPlify. While the current behaviour was designed to strictly accept only the 20 standard amino acids to ensure clean inputs, we acknowledge that many annotation tools append the stop codon symbol (*) by default, and this can indeed interfere with direct input into AMPlify. We appreciate your suggestion to automatically handle stop codons by removing the asterisk internally. This could enhance AMPlify’s usability, especially for users working with outputs from a variety of annotation pipelines (or users who do not know about AMPlify's behaviour/have not read the documentation). We will certainly consider adding this functionality to future versions, as it could streamline workflows and reduce the need for additional preprocessing. In the meantime, as you’ve mentioned, a simple one-liner in PERL or another scripting language can resolve this issue by removing the asterisks prior to running AMPlify. We will also make sure to update our documentation to better highlight this behaviour for users who may not be familiar with it. Thanks again for your valuable feedback and interest in AMPlify. |
That sounds great, thanks @warrenlr for considering this request for a next AMPlify release 🚀 |
Hi @jasmezz, Thank you once again for your valuable suggestion regarding AMPlify! We truly appreciate your insights. After carefully considering your feature request, we have decided to implement functionality to handle stop codon indications (*), as they provide users with an important distinction between biologically ‘complete’ peptides and right-truncated ones. With the release of version 2.0.1, we now process asterisks by internally clipping them from sequences. However, users utilizing the predict script will still be able to see the original sequences, including the asterisks, as they were initially provided. This ensures a seamless experience for both training and prediction purposes. Please note that we still do not support asterisks located within the sequence or non-standard amino acids. We sincerely appreciate your contribution in helping make AMPlify a more comprehensive tool. |
That's really cool, thank you a quick response and release! |
We noticed that AMPlify strictly sticks to the 20 standard amino acids in input sequences and ignores all others, as stated in its help message:
So far, so clear. But even if a stop codon is indicated with the commonly used asterisk
*
, the sequence is ignored. I believe this behaviour might not be desired, because several sequence annotation tools (e.g. Pyrodigal, Prodigal, Bakta, Prokka) append the*
by default; for Prodigal, Prokka, and Bakta it is not even possible to deactivate the*
as stop codon indicator. Thus, one cannot simply use the output from such annotation tools as input for AMPlify without first removing all*
.My feature request is thus, to have AMPlify accept sequences with stop codon indicator and remove the asterisk internally if necessary.
Minimum reproducible example:
*
)I'll link another issue where this behaviour was observed.
The text was updated successfully, but these errors were encountered: