Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed bakta run for some genomes #265

Closed
ZarulHanifah opened this issue Dec 24, 2023 · 8 comments
Closed

Failed bakta run for some genomes #265

ZarulHanifah opened this issue Dec 24, 2023 · 8 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ZarulHanifah
Copy link

I am running bakta on a bunch of genomes, many worked wonderfully, but a few actually failed, due to something CRISPR-related. One of the genome is GCA_025196405.1. Here is the error message.

Traceback (most recent call last):
  File "/home/mzar0002/miniconda3/envs/bakta/bin/bakta", line 10, in <module>
    sys.exit(main())
  File "/home/mzar0002/miniconda3/envs/bakta/lib/python3.10/site-packages/bakta/main.py", line 210, in main
    genome['features'][bc.FEATURE_CRISPR] = crispr.predict_crispr(genome, contigs_path)
  File "/home/mzar0002/miniconda3/envs/bakta/lib/python3.10/site-packages/bakta/features/crispr.py", line 121, in predict_crispr
    assert len(crispr_array['repeats']) == int(copies), print(f"len(reps)={len(crispr_array['repeats'])}, int(copies)={int(copies)}")
AssertionError: None

The commands (part of a snakemake workflow):

bakta --db {input.db} \
            --output $outdir \
            --prefix $prefix \
            --locus-tag $locustag \
            --threads {threads} \
            --force --debug \
            {input.genome} 2> {log}

The log:

[after so many lines]
...
05:02:20.536 - INFO - NC_RNA_REGION - contig=contig_27, start=170467, stop=170494, strand=-, label=L19-Flavobacteria, product=L19-Flavobacteria ribosomal protein leader, length=28, truncated=None, score=39.9, evalue=8.3e-07
05:02:20.536 - INFO - NC_RNA_REGION - contig=contig_30, start=161750, stop=161875, strand=+, label=FMN, product=FMN riboswitch (RFN element), length=126, truncated=None, score=120.0, evalue=3.0e-20
05:02:20.537 - INFO - NC_RNA_REGION - contig=contig_31, start=45483, stop=45584, strand=-, label=SAM, product=SAM riboswitch (S box leader), length=102, truncated=None, score=75.1, evalue=3.8e-13
05:02:20.537 - INFO - NC_RNA_REGION - contig=contig_31, start=88433, stop=88529, strand=-, label=SAM, product=SAM riboswitch (S box leader), length=97, truncated=None, score=70.9, evalue=2.9e-12
05:02:20.537 - INFO - NC_RNA_REGION - contig=contig_31, start=179828, stop=179923, strand=-, label=SAM, product=SAM riboswitch (S box leader), length=96, truncated=None, score=65.5, evalue=4.0e-11
05:02:20.537 - INFO - NC_RNA_REGION - predicted=17
05:02:20.537 - DEBUG - MAIN - start CRISPR prediction
05:02:20.537 - DEBUG - CRISPR - cmd=['pilercr', '-in', '/tmp/tmp_z4vcynq/contigs.fna', '-out', '/tmp/tmp_z4vcynq/crispr.txt', '-noinfo', '-quiet']
05:02:21.448 - INFO - CRISPR - contig=contig_6, start=3, stop=822, spacer-length=30, repeat-length=47, # repeats=11, repeat-consensus=GTTGTGTTATATCACAAAGATATCCAAAATTGAAAGCAATTCACAAC, nt=[GTTGTGTTAT..AATTCACAAC]

I installed bakta through conda.

Thank you.

@ZarulHanifah ZarulHanifah added the bug Something isn't working label Dec 24, 2023
@marade
Copy link

marade commented Jan 4, 2024

I also got this error with a run on Bakta v1.9.1.

predict CRISPR arrays...
len(reps)=5, int(copies)=6
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/py310/bin/bakta", line 10, in
sys.exit(main())
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bakta/main.py", line 210, in main
genome['features'][bc.FEATURE_CRISPR] = crispr.predict_crispr(genome, contigs_path)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bakta/features/crispr.py", line 121, in predict_crispr
assert len(crispr_array['repeats']) == int(copies), print(f"len(reps)={len(crispr_array['repeats'])}, int(copies)={int(copies)}")
AssertionError: None

@oschwengers
Copy link
Owner

Hi @ZarulHanifah / @marade ,
thanks for reporting. Could you provide me with a genome sequence to reproduce & potentially debug this error? I'd like to take a deeper look into this.

@oschwengers oschwengers self-assigned this Jan 8, 2024
@ZarulHanifah
Copy link
Author

ZarulHanifah commented Jan 8, 2024

Thank you @oschwengers . Here you go.
GCA_025196405.1_ASM2519640v1.fasta.txt

@wsowens
Copy link

wsowens commented Jan 10, 2024

Thanks for your work on this project! Just commenting to say that I am experiencing this issue as well with a similar backtrace. Happy to provide more more example genomes if that would be helpful.

Edit: rerunning bakta with the --skip-crispr flag circumvents this issue.

@oschwengers
Copy link
Owner

@ZarulHanifah & @marade ,
I've merged a PR #267 fixing this. I wrongly supposed that there is always an even number of spacers & repeats in each CRISPR array. I fixed this and improved the PILER-CR parser. You can use this already from https://github.com/oschwengers/bakta/tree/main or wait until I've released a patch v1.9.2 - maybe somewhen this week.

@ZarulHanifah
Copy link
Author

Thank you @oschwengers ... unfortunately, another AssertionError from PILER-CR:

Bakta v1.9.2
Options and arguments:
        input: /fs03/jm41/Zarul/C002_B2_results/derep/dereplicated_genomes/metabat.641.fasta
        db: /fs03/ie79/db/bakta_db, version 5.0, full
        output: /fs03/jm41/Zarul/C002_B2_results/bakta/metabat.641
        force: True
        tmp directory: /tmp/tmpbg0fpfp9
        prefix: metabat.641
        threads: 2
        debug: True
        translation table: 11
        locus tag prefix: METABAT.641

Bakta runs in DEBUG mode! Temporary data will not be destroyed at: /tmp/tmpbg0fpfp9

parse genome sequences...
        imported: 388
        filtered & revised: 388
        contigs: 388

start annotation...
predict tRNAs...
        found: 112
predict tmRNAs...
        found: 1
predict rRNAs...
        found: 0
predict ncRNAs...
        found: 2
predict ncRNA regions...
        found: 13
predict CRISPR arrays...
Traceback (most recent call last):
  File "/fs03/ie79/Zarul/status_nanopore/C002_B2/.snakemake/conda/22185ec851ca2597fabecb499d58e23d_/bin/bakta", line 10, in <module>
    sys.exit(main())
  File "/fs03/ie79/Zarul/status_nanopore/C002_B2/.snakemake/conda/22185ec851ca2597fabecb499d58e23d_/lib/python3.10/site-packages/bakta/main.py", line 210, in main
    genome['features'][bc.FEATURE_CRISPR] = crispr.predict_crispr(genome, contigs_path)
  File "/fs03/ie79/Zarul/status_nanopore/C002_B2/.snakemake/conda/22185ec851ca2597fabecb499d58e23d_/lib/python3.10/site-packages/bakta/features/crispr.py", line 105, in predict_crispr
    assert spacer_seq == spacer_genome_seq  # assure PILER-CR provided sequence equals sequence extracted from genome
AssertionError

@oschwengers
Copy link
Owner

hmm... ok could you provide the metabat.641.fasta input file to debug this?

@ZarulHanifah
Copy link
Author

Right, here you go!
metabat.641.fasta.txt

Thank you!

oschwengers added a commit that referenced this issue Mar 6, 2024
@oschwengers oschwengers added this to the v1.9.3 milestone Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants