Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about GBFF output reporting all 0 annotations (vs. txt and gff3 file) #355

Closed
patriciatran opened this issue Dec 13, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@patriciatran
Copy link

patriciatran commented Dec 13, 2024

Hello,

I am running bakta v.1.10.2 this way:

$1 = samplename
bakta --db /path/hidden/db \
        $1_assembly.fasta \
        --output bakta_$1 \
        --threads 8

It outputs all the files, but I have a question about the gbff output file.

Could you explain why there is a discrepancy in the reported annotations being 0 in the gbff file vs the .txt and the gff3 file?
I am comparing this with a bakta output from another analysis ran in the past using a previous version:

Current output:
The gbff file looks like this:

LOCUS       contig_1             2902592 bp    DNA     linear   UNK 13-DEC-2024
DEFINITION  contig_1, whole genome shotgun sequence.
ACCESSION   contig_1
VERSION     contig_1
KEYWORDS    .
SOURCE      None
  ORGANISM  .
            .
COMMENT     Annotated with Bakta
            Software: v1.10.2
            Database: v5.1, full
            DOI: 10.1099/mgen.0.000685
            URL: github.com/oschwengers/bakta
            
            ##Genome Annotation Summary:##
            Annotation Date                :: 12/13/2024, 15:06:35
            CDSs                           ::     0
            tRNAs                          ::     0
            tmRNAs                         ::     0
            rRNAs                          ::     0
            ncRNAs                         ::     0
            regulatory ncRNAs              ::     0
            CRISPR Arrays                  ::     0
            oriCs/oriVs                    ::     0
            oriTs                          ::     0
            gaps                           ::     0
            pseudogenes                    ::     0

However, the sample.txt output looks like this:

Sequence(s):
Length: 2902592
Count: 1
GC: 32.8
N50: 2902592
N90: 2902592
N ratio: 0.0
coding density: 85.5

Annotation:
tRNAs: 60
tmRNAs: 1
rRNAs: 16
ncRNAs: 90
ncRNA regions: 25
CRISPR arrays: 0
CDSs: 2704
pseudogenes: 4
hypotheticals: 39
sORFs: 16
gaps: 0
oriCs: 4
oriVs: 0
oriTs: 1

Bakta:
Software: v1.10.2
Database: v5.1, full
DOI: 10.1099/mgen.0.000685
URL: github.com/oschwengers/bakta

Comparing with output from a previous run with another bakta version.
Note: Ignore the actual numbers, this is not ran on the same genome. Just pasting an output here for example purposes.

LOCUS       contig_1             2921883 bp    DNA     linear   UNK 17-OCT-2024
DEFINITION  contig_1, whole genome shotgun sequence.
ACCESSION   contig_1
VERSION     contig_1
KEYWORDS    .
SOURCE      None
  ORGANISM  .
            .
COMMENT     Annotated with Bakta
            Software: v1.6.1
            Database: v4.0
            DOI: 10.1099/mgen.0.000685
            URL: github.com/oschwengers/bakta
            
            ##Genome Annotation Summary:##
            Annotation Date                :: 10/17/2024, 22:36:47
            Annotation Pipeline            :: Bakta
            Annotation Software version    ::  v1.6.1
            Annotation Database version    ::  v4.0
            CDSs                           :: 2,733
            tRNAs                          ::    61
            tmRNAs                         ::     1
            rRNAs                          ::    19
            ncRNAs                         ::    88
            regulatory ncRNAs              ::    25
            CRISPR Arrays                  ::     0
            oriCs/oriVs                    ::     2
            oriTs                          ::     0
            gaps                           ::     0
            pseudogenes                    ::     5
Sequence(s):
Length: 3011165
Count: 3
GC: 32.7
N50: 2921883
N ratio: 0.0
coding density: 85.3

Annotation:
tRNAs: 61
tmRNAs: 1
rRNAs: 19
ncRNAs: 92
ncRNA regions: 25
CRISPR arrays: 0
CDSs: 2825
pseudogenes: 7
hypotheticals: 171
signal peptides: 0
sORFs: 11
gaps: 0
oriCs: 3
oriVs: 0
oriTs: 1

Bakta:
Software: v1.6.1
Database: v4.0
DOI: 10.1099/mgen.0.000685
URL: github.com/oschwengers/bakta

Thank you in advance for your explanation.

Best,
Patricia

@patriciatran patriciatran added the bug Something isn't working label Dec 13, 2024
@patriciatran
Copy link
Author

note: Duplicate of issue #354 , I'm sorry that I clicked "post" twice.

@oschwengers
Copy link
Owner

no problem. I close this and focus on the other one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants