Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about GBFF output reporting all 0 annotations (vs. txt and gff3 file #354

Closed
patriciatran opened this issue Dec 13, 2024 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@patriciatran
Copy link

Hello,

I am running bakta v.1.10.2 this way:

$1 = samplename
bakta --db /path/hidden/db \
        $1_assembly.fasta \
        --output bakta_$1 \
        --threads $2

It outputs all the files, but I have a question about the gbff output file.

Could you explain why there is a discrepancy in the reported annotations being 0 in the gbff file vs the .txt and the gff3 file?
I am comparing this with a bakta output from another analysis ran in the past using a previous version:

Current output:
The gbff file looks like this:

LOCUS       contig_1             2902592 bp    DNA     linear   UNK 13-DEC-2024
DEFINITION  contig_1, whole genome shotgun sequence.
ACCESSION   contig_1
VERSION     contig_1
KEYWORDS    .
SOURCE      None
  ORGANISM  .
            .
COMMENT     Annotated with Bakta
            Software: v1.10.2
            Database: v5.1, full
            DOI: 10.1099/mgen.0.000685
            URL: github.com/oschwengers/bakta
            
            ##Genome Annotation Summary:##
            Annotation Date                :: 12/13/2024, 15:06:35
            CDSs                           ::     0
            tRNAs                          ::     0
            tmRNAs                         ::     0
            rRNAs                          ::     0
            ncRNAs                         ::     0
            regulatory ncRNAs              ::     0
            CRISPR Arrays                  ::     0
            oriCs/oriVs                    ::     0
            oriTs                          ::     0
            gaps                           ::     0
            pseudogenes                    ::     0

However, the sample.txt output looks like this:

Sequence(s):
Length: 2902592
Count: 1
GC: 32.8
N50: 2902592
N90: 2902592
N ratio: 0.0
coding density: 85.5

Annotation:
tRNAs: 60
tmRNAs: 1
rRNAs: 16
ncRNAs: 90
ncRNA regions: 25
CRISPR arrays: 0
CDSs: 2704
pseudogenes: 4
hypotheticals: 39
sORFs: 16
gaps: 0
oriCs: 4
oriVs: 0
oriTs: 1

Bakta:
Software: v1.10.2
Database: v5.1, full
DOI: 10.1099/mgen.0.000685
URL: github.com/oschwengers/bakta

Comparing with output from a previous run with another bakta version.
Note: Ignore the actual numbers, this is not ran on the same genome. Just pasting an output here for example purposes.

LOCUS       contig_1             2921883 bp    DNA     linear   UNK 17-OCT-2024
DEFINITION  contig_1, whole genome shotgun sequence.
ACCESSION   contig_1
VERSION     contig_1
KEYWORDS    .
SOURCE      None
  ORGANISM  .
            .
COMMENT     Annotated with Bakta
            Software: v1.6.1
            Database: v4.0
            DOI: 10.1099/mgen.0.000685
            URL: github.com/oschwengers/bakta
            
            ##Genome Annotation Summary:##
            Annotation Date                :: 10/17/2024, 22:36:47
            Annotation Pipeline            :: Bakta
            Annotation Software version    ::  v1.6.1
            Annotation Database version    ::  v4.0
            CDSs                           :: 2,733
            tRNAs                          ::    61
            tmRNAs                         ::     1
            rRNAs                          ::    19
            ncRNAs                         ::    88
            regulatory ncRNAs              ::    25
            CRISPR Arrays                  ::     0
            oriCs/oriVs                    ::     2
            oriTs                          ::     0
            gaps                           ::     0
            pseudogenes                    ::     5
Sequence(s):
Length: 3011165
Count: 3
GC: 32.7
N50: 2921883
N ratio: 0.0
coding density: 85.3

Annotation:
tRNAs: 61
tmRNAs: 1
rRNAs: 19
ncRNAs: 92
ncRNA regions: 25
CRISPR arrays: 0
CDSs: 2825
pseudogenes: 7
hypotheticals: 171
signal peptides: 0
sORFs: 11
gaps: 0
oriCs: 3
oriVs: 0
oriTs: 1

Bakta:
Software: v1.6.1
Database: v4.0
DOI: 10.1099/mgen.0.000685
URL: github.com/oschwengers/bakta

Thank you in advance for your explanation.

Best,
Patricia

@manalcric
Copy link

manalcric commented Dec 16, 2024

Hello,
I am running bakta 1.10.2 and I have the same problem, but I don't know why. I think that the INSDC export to EMBL and GBFF is not working, since the rest of annotations files are correct.
I have renamed the contigs with really short names but this is not the problem.
Could be related with the Numpy warning?? UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero

Thanks in advance,
BW,
Manuel

@Dx-wmc
Copy link

Dx-wmc commented Dec 17, 2024

I have encountered the same problem.

@simone-pignotti
Copy link

simone-pignotti commented Dec 17, 2024

I am also running into this bug!

EDIT: downgrading to 1.10.1 works

@oschwengers oschwengers self-assigned this Dec 17, 2024
@oschwengers oschwengers added this to the c1.10.3 milestone Dec 17, 2024
@oschwengers
Copy link
Owner

Hi, yes, this is obviously a severe bug that I could reproduce. I'm working on it.

@oschwengers
Copy link
Owner

OK, so this is now fixed by https://github.com/oschwengers/bakta/releases/tag/v1.10.3.

@manalcric In fact, this was just a critical typo and is not related to the numpy warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants