-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bakta having problems handling Ns - invalid DNA characters #87
Comments
Hi @RotimiDada , thanks for reporting. Since you mentioned aligned contigs: We strive to have Bakta accepting allmost all valid IUPAC nucleotide characters of a DNA Fasta file. Currently, these are: Due to the fact that it is not supported by 3rd party tools involved in the workflow, the only character that is excluded on purpose is |
Thank you Oliver for a super fast response. I have checked the contigs and can't seem to find "-" character in them. I am attaching the fasta file for you to see if you could also reproduce this error. By the way, Prokka annotates these files without encountering errors, but I need the annotation to conform to the nomenclature of the databases that I used for calling my genes of interest (e.g. Virulencefinder) - FAIR....... |
That sounds interesting. Unfortunately, the file you've provided is not the Fasta file. Could you attach the input Fasta file you've used for the annotation so I can take a look at that? |
Thank you once again Oliver. I am sorry. I don't know I erroneously sent you an annotation file. Please find attached the fasta file. ---Kind regards, |
Dear Rotimi, These are not compatible with 3rd party tools, e.g. Infernal that are used in the workflow. When I remove all these dashes (Prokka does that automatically) Bakta successfully annotates this amended Fasta file: Thank you very much for reporting and bringing up this issue. As this might affect other users as well, I will add an automated removal of dashes soon. |
Dear Oliver, Many thanks for your help. I can also confirm that after removing the dashes, bakta ran successfully. I am sorry for having to make you spot the dashes yourself, after I failed to detect dashes in my first attempt. By the way, thank you for planning to automate dash removal. Warm regards, |
You're welcome! For the sake of documentation, a soon-to-come commit will address this issue. Therefore, I'll keep this still open for a while. |
My thoughts exactly. Cheers! |
Thank you @oschwengers and the team for introducing this tool. Your adherence to the FAIR principles is a huge contribution.
I have contigs from reference-based alignment and some genes of interest in the contigs. I need to annotate the contigs to get some information for downstream analyses. My problem is with a call from bakta that "fasta sequence contains invalid DNA characters". My guess is that Ns are called invalid DNA characters by bakta.
Here is the content of the log file showing the error message:
15:30:05.858 - ERROR - FASTA - import: Fasta sequence contains invalid DNA characters! id=%s
15:30:05.859 - ERROR - MAIN - wrong genome file format!
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/bakta/main.py", line 124, in main
contigs = fasta.import_contigs(cfg.genome_path)
File "/usr/local/lib/python3.9/site-packages/bakta/io/fasta.py", line 26, in import_contigs
raise ValueError(f'Fasta sequence contains invalid DNA characters! id={record.id}')
ValueError: Fasta sequence contains invalid DNA characters! id=INOLLH026C
15:30:05.862 - INFO - MAIN - removed tmp dir: /tmp/tmpzubdk5dw
Thank you for your help
----Rotimi
The text was updated successfully, but these errors were encountered: