-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatible file for NCBI submission? #69
Comments
Hi @michoug , thanks a lot for reporting this. So far we've tested the submission only for ENA. Of course, we're keen to make NCBI submissions as smooth as possible, too. I'll encode the products as requested in the GFF3 specifications. For the 1st and 3rd point, I think it might be best to add a Is this a complete list of all issues you encountered? |
Hi, Here is the command that I used to generate submission files:
Here the link for the documentation (https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/#run) |
Thanks for the detailed information - that helps a lot. However, fixing the |
interesting side effect: adhering to the GFF3 comma encoding convention ( |
Hi @michoug , I've added a couple of fixes and improvements for GFF3 based GenBank submissions via I'll release Please let me know if there are any further issues - I'm looking forward to your feedback. |
Hi,
|
Hi,
These are the low hanging fruits. All the other remaining issues are way more complex to fix - if they can be handled in an automatic manner at all. |
Hi,
I'm in the process of submitting annotated genomes with Bakta to the NCBI, hence while checking for the quality and errors of annotation (via table2asn and the gff3 file https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/#run), I encountered several issues with it.
I understand if it's not something this tool will be compatible with as it can be quite tricky but anyhow, here is a list of some of the issues that I had that may be addressed in the gff file:
Commas that are intended to be part of a name should be encoded (%2C) according to the GFF3 specifications. However, literal commas should only be included when they are part of enzymatic names. Semi-colons generally should not be included in product names.
SO:
in dbxref as they are not yet recognized (https://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/)Best
Greg
The text was updated successfully, but these errors were encountered: