Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmRNA that cross the origin #90

Closed
LuisFF opened this issue Jan 12, 2022 · 2 comments
Closed

tmRNA that cross the origin #90

LuisFF opened this issue Jan 12, 2022 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@LuisFF
Copy link

LuisFF commented Jan 12, 2022

Hi Oliver,

Thanks for developing bakta and for quickly adding new features and improving this great tool.

When running bakta on some genomes, I've noticed that it leaves the coordinates from tmRNA that cross the origin unchanged. For example, the GFF3 file would look like this:

contig_1       Bakta   region  1       2946    .       +       .   [...]
contig_1       Aragorn tmRNA   -17     541     .       -       .   [...]    

While BioPython allows recording of negative coordinates in embl and genbank formats, it is not possible to parse these locations back from the files. A simple example is:

from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
import io
 
with io.StringIO() as stream:
    seq = Seq("ACGTTT" * 30)
    record = SeqRecord(seq, id="foobar", annotations={"molecule_type": "DNA"})
    record.features.append(SeqFeature(FeatureLocation(-12, 64, strand=+1), type='tmRNA'))
    print(record.features[0].location)
    SeqIO.write(record, stream, 'genbank')
    stream.seek(0)
    records = list(SeqIO.parse(stream, 'genbank'))
    print(records[0].features[0].location)

The way bakta handles CDS that cross the origin seems to be different and the coordinates look correct. Would it be possible to fix this for aragorn predictions? One example where this issue happens is GCF_900101355.1.

Kind regards,
Luis

@LuisFF LuisFF added the bug Something isn't working label Jan 12, 2022
@oschwengers oschwengers self-assigned this Jan 12, 2022
@oschwengers oschwengers added this to the v1.3.0 milestone Jan 12, 2022
@oschwengers
Copy link
Owner

oschwengers commented Jan 12, 2022

Hi Luis,
thanks a lot for reporting this. I took a quick look at the results of the proposed GCF_900101355.1 genome. Of course, the coordinates should not be negative but positive pointing to the 3' end of the sequence.

Interestingly, aragorn keeps predicting edge tmRNAs even though the -l flag is set. This was not expected and hasn't occurred so far (at least it hasn't been reported). I added a fix to catch these corner cases of edge-spanning tmRNA within draft genomes.

I'm currently working on v1.3.0 which might still take a while but meanwhile you can install & try this fix from main via:

git clone https://github.com/oschwengers/bakta.git
python -m pip install --no-deps --ignore-installed bakta/

Please let me know if this is working for you.

@oschwengers
Copy link
Owner

I'll close this for now. Please, do not hesitate to re-open it it any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants