Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems recognizing the BUSCO databases with nextflow BTK pipeline v0.6.0 #122

Closed
gitcruz opened this issue Nov 14, 2024 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@gitcruz
Copy link

gitcruz commented Nov 14, 2024

Description of the bug

Dear developers,

I downloaded and installed the pipeline v0.6.0.

As pointed out in the usage, I downloaded the entire busco v5 databases, untarred them. As I was having a recurrent error with BUSCO, after that I also decompressed the refseq_db.faa.gz for all dbs. However the error still persists and it looks like this:

2024-11-14 12:32:41 ERROR: Unable to run BUSCO in offline mode. Dataset /scratch_tmp/32318106/nxf.LMCX5N46Un/lineages/lineages/viridiplantae_odb10 does not exist.
mv: cannot stat 'tnRamLact8_Nhpy_mq10-viridiplantae_odb10-busco//short_summary..json': No such file or directory
mv: cannot stat 'tnRamLact8_Nhpy_mq10-viridiplantae_odb10-busco//short_summary..txt': No such file or directory

Work dir:
/scratch_isilon/groups/assembly/data/projects/BGE/tnRamLact/assembly/curation/nextdenovo.hypo1.purged.yahs_mq10/1_blobtoolkit/blobtoolkit_nextflow/work/e3/910c9ec4ebbab08e742510c3a50ee8

I don't really know why is not finding the busco databases!!! all of them are stored here: /scratch_isilon/groups/assembly/data/databases/BUSCO_2024_11/v5/data/lineages/

This is my nextflow command:

nextflow
run /software/assembly/pipelines/nf-core-pipelines/blobtoolkit_sanger-tol/blobtoolkit-0.6.0/main.nf
-c /software/assembly/pipelines/nf-core-pipelines/cluster_config/cnag_nextflow_queue.config
-profile singularity
--input tnRamLact8_samplesheet_s3.csv
--outdir out
--fasta tnRamLact8_Nhpy_mq10.fasta
--taxon 947578
--align true
--taxdump /scratch_isilon/groups/assembly/data/databases/taxdump_2024_10_01
--blastp /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd
--blastx /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd
--blastn /scratch_isilon/groups/assembly/data/databases/nt_2024_10_03
--busco /scratch_isilon/groups/assembly/data/databases/BUSCO_2024_11/v5/data/lineages/
--busco_lineages metazoa_odb10,viridiplantae_odb10,fungi_odb10,apicomplexa_odb10,euglenozoa_odb10,diptera_odb10,alphaproteobacteria_odb10,mycoplasmatales_odb10,proteobacteria_odb10,nematoda_odb10,rickettsiales_odb10

I am attaching the full log and sbatch command so you can check it entirely. I would really appreciate if you can help me to overcome this error and get this pipeline running.

Thanks.
btk_v0.6.0_nextflow.log
run_blobtoolkit_v060_on_tnRamLact8.sbatch.txt

Command used and terminal output

No response

Relevant files

No response

System information

No response

@gitcruz gitcruz added the bug Something isn't working label Nov 14, 2024
@muffato
Copy link
Member

muffato commented Nov 14, 2024

Hi @gitcruz . Can you try with /scratch_isilon/groups/assembly/data/databases/BUSCO_2024_11/v5/data/, i.e. without the trailing lineages ?

The way we run the pipeline, --busco gets the path to a directory that contains lineages, cf:

├── information
├── lineages
│   ├── acidobacteria_odb10
│   │   ├── hmms
│   │   └── info
│   ├── aconoidasida_odb10
│   │   ├── hmms
│   │   ├── info
│   │   └── prfl
│   (...)
│   ├── viridiplantae_odb10
│   │   ├── hmms
│   │   ├── info
│   │   └── prfl
│   └── xanthomonadales_odb10
│       ├── hmms
│       └── info
└── placement_files

All the refseq_db.faa.gz have been decompressed already (like you did). I should mention that in the doc.

Matthieu

@gitcruz
Copy link
Author

gitcruz commented Nov 15, 2024

Thanks for the quick response Matthieu,

I'm trying it that way. So far the nextflow job has been running > 2hours

WRT the databases path is it necessary to add the final slash or not (i.e. --blastn /scratch_isilon/groups/assembly/data/databases/nt_2024_10_03/)?

And also I don't understand the guide examples for the diamond databases I just have one. While you show two:
--blastp /path/to/buscogenes.dmnd
--blastx /path/to/buscoregions.dmnd

I am using only one:
--blastp /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd
--blastx /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \

Is this correct? I followed the guide and built just one diamond db...

Regards,
Fernando

@muffato
Copy link
Member

muffato commented Nov 15, 2024

WRT the databases path is it necessary to add the final slash or not (i.e. --blastn /scratch_isilon/groups/assembly/data/databases/nt_2024_10_03/)?

I think it should work the same with and without.

And also I don't understand the guide examples for the diamond databases I just have one. While you show two: --blastp /path/to/buscogenes.dmnd --blastx /path/to/buscoregions.dmnd

I am using only one: --blastp /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd --blastx /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \

Is this correct? I followed the guide and built just one diamond db...

Yes it's correct. Those are different parameters in case people want to use different databases. I could imagine someone optimising the pipeline by using a more restricted database for the blastp search (which happens first) in order to get the blastp jobs done quicker, while using the complete database for the blastx search (which happens after).
In practice, the way we run it on all our assembled genomes, we use the same, complete, database for both.

Best,
Matthieu

@gitcruz
Copy link
Author

gitcruz commented Nov 18, 2024

Hi Matthieu,

It have worked well for three species: 2 vertebrates and 1 nemertea worm. I may have more questions but I'll post them as separate comments or issues.

Thanks,
Fernando

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants