Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwa index not found with -r #112

Closed
bglindner opened this issue May 2, 2022 · 14 comments
Closed

bwa index not found with -r #112

bglindner opened this issue May 2, 2022 · 14 comments

Comments

@bglindner
Copy link

bglindner commented May 2, 2022

Hi Ben,

I've been competitively mapping reads to a collection of ~7,000 draft genomes using bwa-mem2. Recently, I've tried switching to coverm to complete both mapping (still with bwa-mem) and depth calculation (seqdepth calculation is much faster than the ad hoc python scripts I was using previously) for these genomes.

My longterm aim is to run multiple metagenomes through coverm but since my genome set is so large, I want to avoid building a new index each time I start the process over with a new metagenome. So I wanted to offer coverm genome my bwa index via -r for each run. Based on reading the help documentation, I'm attempting to call coverm genome in this way:

coverm genome -1 ${fwd} -2 ${rev} -p bwa-mem -r ${index} -d ${g_dir} \
--min-read-percent-identity 95 --min-read-aligned-percent 75 --output-format sparse \
--trim-max 90 --trim-min 10 -m trimmed_mean -o ${output} -t 24

Where the variables follow:
fwd/rev: forward and reverse gzipped FASTQ,
index: path to a bwa-mem index, including the stem
g_dir: path to directory containing genomes (suffix matches default .fna)
output: what I want the output .tsv saved as

When I run this, coverm is not able to recognize my BWA index (passed with my index variable above) despite a lot of fiddling on my end. I see the following error before coverm exits:

thread 'main' panicked at 'The reference specified '/storage/coda1/p-ktk3/0/shared/rich_project_bio-konstantinidis/shared3/mst/01_pipeline/xx_references/xx_fulldbs/MST_db' does not appear to exist', src/mapping_parameters.rs:159:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Any thoughts or feedback would be greatly appreciated.

Cheers,

Blake

@wwood
Copy link
Owner

wwood commented May 5, 2022

Hi Blake,

bwa-mem2 usage was put in the code (#72 ) but not as part of a release yet. Are you using 0.6.1 or newer?
Thanks.

@bglindner
Copy link
Author

Hey Ben,

Thanks for your reply. Sorry for any confusion -- I'm not attempting to run bwa-mem2 within CoverM.

I am running CoverM v0.6.1. I've used bwa v0.7.17 to index my genome database. But CoverM won't accept that via -r (path and stem for the index files being provided are correct).

I have hundreds of metagenomes I want to map to thousands of MAGs, so I want to avoid building the index with each pass. I could map those myself and then handle the BAM files into CoverM but wanted to check here first if -r is indeed intended to allow me to provide my own bwa index (thus avoiding this). And if so, what I might be doing wrong above that is preventing me from doing so.

Thanks for your time!

@wwood
Copy link
Owner

wwood commented May 5, 2022

Ah, OK.

My testing just below suggests coverm already picks up the stem of the DB as it is supposed to (this is for bwa-mem v0.7.17 / coverm 0.6.1 but the same held true for bwa-mem2 / coverm newest). Using 7seqs.fna from the tests/data directory.

(coverm-v0.6.1)cl5n008:20220506:~/git/coverm$ bwa index /tmp/7seqs.fna
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.01 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index /tmp/7seqs.fna
[main] Real time: 0.013 sec; CPU: 0.013 sec
(coverm-v0.6.1)cl5n008:20220506:~/git/coverm$ coverm contig -p bwa-mem -r /tmp/7seqs.fna --single tests/data/2seqs.fasta
[2022-05-05T19:37:07Z INFO  coverm] CoverM version 0.6.1
[2022-05-05T19:37:07Z INFO  coverm] Using min-covered-fraction 0%
[2022-05-05T19:37:07Z INFO  bird_tool_utils::external_command_checker] Found samtools version 1.15.1
[2022-05-05T19:37:07Z INFO  coverm::mapping_index_maintenance] BWA index appears to be complete, so going ahead and using it.
[2022-05-05T19:37:07Z INFO  coverm::contig] In sample '7seqs.fna/2seqs.fasta', found 2 reads mapped out of 2 total (100.00%)
Contig  7seqs.fna/2seqs.fasta Mean
genome1~random_sequence_length_11000    0
genome1~random_sequence_length_11010    0
genome2~seq1    1
genome3~random_sequence_length_11001    0
genome4~random_sequence_length_11002    0
genome5~seq2    1
genome6~random_sequence_length_11003    0

Are you using it in the above way?
ben

@fbeghini
Copy link

Hi Ben,
I've been encountering the same issue when using coverm with bwa (coverm version 0.6.1, bwa version 0.7.17).
When using a pre-built bwa index and TMPDIR is set, coverm tries to build the index in the TMP directory without checking if the index is complete in the provided location.

[2022-06-13T15:45:11Z INFO coverm::mapping_index_maintenance] Generating BWA_MEM index for /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/representatives_deduplicated.mmi.fa .. [2022-06-13T15:45:11Z DEBUG coverm::mapping_index_maintenance] Running DB indexing command: "bwa" "index" "-p" "/home/fb343/scratch60/coverm-mapping-index.xJlBNZi30ad2/representatives_deduplicated.mmi.fa" "/home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/representatives_deduplicated.mmi.fa"

TMPDIR has been set to /home/fb343/scratch60/

@wwood
Copy link
Owner

wwood commented Jun 13, 2022

Hi @fbeghini I would be surprised if TMPDIR being set would change the behaviour. Can you let me know the command line you are using and an ls of the directory containing the bwa index? Since two people have reported this now I imagine there is something untoward going on, but I cannot reproduce as shown in the comment just above.

Thanks, ben

@fbeghini
Copy link

fbeghini commented Jun 13, 2022

Hi @wwood ,
The command line I'm using is

TMPDIR=/home/fb343/scratch60 coverm genome -t 20 -1 /home/fb343/project/HMBP/reads/021-0001-05/021-0001-05.bigtrim_1.fq.gz -2 /home/fb343/project/HMBP/reads/021-0001-05/021-0001-05.bigtrim_2.fq.gz -p bwa-mem -r /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.fa --genome-definition /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/contig_to_bin_map --no-zeros --min-read-aligned-percent 70 --min-read-percent-identity 90 --min-covered-fraction 10 --bam-file-cache-directory /home/fb343/christakis_loan/coverm/021-0001-05/ -o /home/fb343/christakis_loan/coverm/021-0001-05/021-0001-05.relab --verbose

The folder has just the files produced by bwa
$ ls /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/ /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.amb /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.bwt /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.pac /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.ann /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.fa /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.sa

@wwood
Copy link
Owner

wwood commented Jun 13, 2022

Hi,

In the call to coverm you specified the reference as repded.fa but the root of the index is just repded. In this case the reference should be specified to coverm as the root i.e. /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.

I guess you used bwa index with -p as something other than default?

Did I get that right?
Thanks, ben

@fbeghini
Copy link

I tried first using just the root of the index but coverm failed to run giving
thread 'main' panicked at 'The reference specified '/home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded' does not appear to exist', src/mapping_parameters.rs:159:9

I found this post while looking if someone else was having the same issue and tried to run it by adding the file extension wondering if coverm would try to build the index after checking if all the index files are present.

bwa index was run with -b 10000000000 -p repded

@wwood
Copy link
Owner

wwood commented Jun 13, 2022

Ah, right. That is a slight bug then. To work around that, you can just create an empty file i.e. touch /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded and then specify -r /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded to coverm.

I didn't quite understand that last part. Was there some reason that you needed to use -p with bwa index?

@fbeghini
Copy link

The empty file trick works, thanks. Also, no reason to use -p, wanted to make sure that was not a problem of the file name when I was creating the index.

wwood added a commit that referenced this issue Jun 13, 2022
@wwood
Copy link
Owner

wwood commented Jun 13, 2022

Good good. Closing this now since I guess the underlying issue has been solved for the next release.

@wwood wwood closed this as completed Jun 13, 2022
@jianshu93
Copy link

Hello Ben,

Can you please update the crates.io page for CoverM first?

Thanks,

Jianshu

@wwood
Copy link
Owner

wwood commented Jun 14, 2022

Hi Jianshu,

I consider issues as closed when they are fixed in the main branch, which will make it into the next release (here, the one after 0.6.1).

Having said that, a new release is quite overdue - hope to get to it soon.

@jianshu93
Copy link

Thanks ben, I can compile it now but for those who have less knowledge about compiling, they may need a new binary or something.

Jianshu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants