You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for developing bakta and for quickly adding new features and improving this great tool.
When running bakta on some genomes, I've noticed that it leaves the coordinates from tmRNA that cross the origin unchanged. For example, the GFF3 file would look like this:
While BioPython allows recording of negative coordinates in embl and genbank formats, it is not possible to parse these locations back from the files. A simple example is:
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
import io
with io.StringIO() as stream:
seq = Seq("ACGTTT" * 30)
record = SeqRecord(seq, id="foobar", annotations={"molecule_type": "DNA"})
record.features.append(SeqFeature(FeatureLocation(-12, 64, strand=+1), type='tmRNA'))
print(record.features[0].location)
SeqIO.write(record, stream, 'genbank')
stream.seek(0)
records = list(SeqIO.parse(stream, 'genbank'))
print(records[0].features[0].location)
The way bakta handles CDS that cross the origin seems to be different and the coordinates look correct. Would it be possible to fix this for aragorn predictions? One example where this issue happens is GCF_900101355.1.
Kind regards,
Luis
The text was updated successfully, but these errors were encountered:
Hi Luis,
thanks a lot for reporting this. I took a quick look at the results of the proposed GCF_900101355.1 genome. Of course, the coordinates should not be negative but positive pointing to the 3' end of the sequence.
Interestingly, aragorn keeps predicting edge tmRNAs even though the -l flag is set. This was not expected and hasn't occurred so far (at least it hasn't been reported). I added a fix to catch these corner cases of edge-spanning tmRNA within draft genomes.
I'm currently working on v1.3.0 which might still take a while but meanwhile you can install & try this fix from main via:
Hi Oliver,
Thanks for developing
bakta
and for quickly adding new features and improving this great tool.When running
bakta
on some genomes, I've noticed that it leaves the coordinates from tmRNA that cross the origin unchanged. For example, the GFF3 file would look like this:While BioPython allows recording of negative coordinates in
embl
andgenbank
formats, it is not possible to parse these locations back from the files. A simple example is:The way
bakta
handles CDS that cross the origin seems to be different and the coordinates look correct. Would it be possible to fix this foraragorn
predictions? One example where this issue happens is GCF_900101355.1.Kind regards,
Luis
The text was updated successfully, but these errors were encountered: