[Question] Createindex command takes a huge amount of time #962

nicoceres · 2025-02-24T10:08:53Z

Hi,
It is my first time running mmseqs.

Actually, I'm at the stage where I want to index one of my target dbs, namely BFD (>700 Gb).

The log file looks like this:

createindex path_to_mmseqs_db/bfd/bfd path_to_my_local_tmp

MMseqs Version:          	14-7e284+ds-1+b2
Seed substitution matrix 	aa:VTML80.out,nucl:nucleotide.out
k-mer length             	0
Alphabet size            	aa:21,nucl:5
Compositional bias       	1
Compositional bias       	1
Max sequence length      	65535
Max results per query    	300
Mask residues            	1
Mask residues probability	0.9
Mask lower case residues 	0
Spaced k-mers            	1
Spaced k-mer pattern     	
Sensitivity              	7.5
k-score                  	seq:0,prof:0
Check compatible         	0
Search type              	0
Split database           	0
Split memory limit       	0
Verbosity                	3
Threads                  	32
Min codons in orf        	30
Max codons in length     	32734
Max orf gaps             	2147483647
Contig start mode        	2
Contig end mode          	2
Orf start mode           	1
Forward frames           	1,2,3
Reverse frames           	1,2,3
Translation table        	1
Translate orf            	0
Use all table starts     	false
Offset of numeric ids    	0
Create lookup            	0
Compressed               	0
Add orf stop             	false
Overlap between sequences	0
Sequence split mode      	1
Header split mode        	0
Strand selection         	1
Remove temporary files   	false

indexdb ../../data/mmseqs_alphafold_db/bfd/bfd ../../data/mmseqs_alphafold_db/bfd/bfd --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 32 

Target split mode. Searching through 29 splits
Estimated memory consumption: 551G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Index table: counting k-mers
[=================================================================] 88.79M 12m 13s 185ms
Index table: Masked residues: 273185904
Index table: fill
[=================================================================

and stays like this since a while (days!).

The output fold looks like this:

When I look at the RAM of the machine I use for the calculation, I get this:

Is there a problem, in your opinion?

Thanks in advance for your advice.

The text was updated successfully, but these errors were encountered:

milot-mirdita · 2025-02-26T04:44:07Z

There is no further output? This looks very broken. Can you check if this issue was resolved in the last release 17 please?

nicoceres · 2025-02-26T09:52:26Z

No further output.
I'll check release 17, thanks for your answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Createindex command takes a huge amount of time #962

[Question] Createindex command takes a huge amount of time #962

nicoceres commented Feb 24, 2025

milot-mirdita commented Feb 26, 2025

nicoceres commented Feb 26, 2025

[Question] Createindex command takes a huge amount of time #962

[Question] Createindex command takes a huge amount of time #962

Comments

nicoceres commented Feb 24, 2025

milot-mirdita commented Feb 26, 2025

nicoceres commented Feb 26, 2025