Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: invalid configuration argument : /work/lib/libmarv/src/pssm.cuh #963

Open
saro2-a opened this issue Feb 24, 2025 · 3 comments
Open

Comments

@saro2-a
Copy link

saro2-a commented Feb 24, 2025

I'm running colabfold/mmseqs, but I get a cuda error.

Any advice on what could be wrong? I ran quite a few trials, but couldn't figure out

I installed via

RUN wget https://mmseqs.com/latest/mmseqs-linux-gpu.tar.gz && \
    tar xvfz mmseqs-linux-gpu.tar.gz

ENV PATH=/app/mmseqs/bin/:$PATH
Using pre-initialized databases from /workspace/db
S3 upload is not enabled, skipping upload
Starting in server mode...
Starting MMseqs2 GPU servers...
No specific CUDA devices set, using all available GPUs
Starting colabfold_envdb server...
Starting uniref30 server...
GPU servers are running. Use CUDA_VISIBLE_DEVICES to control GPU allocation.
Database location: /workspace/db
Server PIDs: 394, 395
Server mode started, now waiting for servers to close
gpuserver /workspace/db/uniref30_2302_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1
MMseqs Version:      	a2815df9a6c6da173589fb65b3f71639ea08336d
Use GPU              	0
Max results per query	10000
Preload mode         	0
Prefilter mode       	1
7971804618108345187
gpuserver /workspace/db/colabfold_envdb_202108_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1
MMseqs Version:      	a2815df9a6c6da173589fb65b3f71639ea08336d
Use GPU              	0
Max results per query	10000
Preload mode         	0
Prefilter mode       	1
13217853096240131807
CUDA error: invalid configuration argument : /work/lib/libmarv/src/pssm.cuh, line 346
(colabfold_env) root@4ed5491a9855:/app# echo ">seq1\nMKLPVREQVITVQQRGTVYQPPQRDYVLLVSENESSEITQELTVKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLE" > input_sequences.fasta && colabfold_search --mmseqs $(which mmseqs) --gpu 1 --gpu-server 1 input_sequences.fasta ${DB_DIR} msas
INFO:colabfold.mmseqs.search:Running /app/mmseqs/bin/mmseqs createdb msas/query.fas msas/qdb --shuffle 0
createdb msas/query.fas msas/qdb --shuffle 0 

Converting sequences

Time for merging to qdb_h: 0h 0m 0s 0ms
Time for merging to qdb: 0h 0m 0s 0ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 1ms
INFO:colabfold.mmseqs.search:Running /app/mmseqs/bin/mmseqs search msas/qdb /workspace/db/uniref30_2302_db msas/res msas/tmp --threads 64 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1
Create directory msas/tmp
search msas/qdb /workspace/db/uniref30_2302_db msas/res msas/tmp --threads 64 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1 

ungappedprefilter msas/qdb /workspace/db/uniref30_2302_db.idx msas/tmp/16599252575445546166/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 1 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 64 --compressed 0 -v 3 

Index version: 16
Generated by:  17.b804f
ScoreMatrix:  VTML80.out

Nvidia SMI

(colabfold_env) root@4ed5491a9855:/app# nvidia-smi 
Mon Feb 24 17:12:16 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:81:00.0 Off |                    0 |
| N/A   31C    P0             62W /  300W |   39821MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

NVCC


(colabfold_env) root@4ed5491a9855:/app# nvcc --version 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
@milot-mirdita
Copy link
Member

How did you set up the databases? That's the most likely thing that could have gone wrong given the error message

@saro2-a
Copy link
Author

saro2-a commented Feb 28, 2025

Hi @milot-mirdita , thank you for answering, I used https://github.com/sokrypton/ColabFold/blob/main/setup_databases.sh

  1. does it make any difference which mmseqs version I use at time of running setup_databases.sh?
  2. I re-initialized it twice, but not much luck so far. I also noticed that the size of differ across re-initialization, is that expected?

-rw-rw-rw- 1 root root 117965643010 Feb 24 19:32 colabfold_envdb_202108.tar.gz
-rw-rw-rw- 1 root root  40226840989 Sep 10  2021 colabfold_envdb_202108.tsv
-rw-rw-rw- 1 root root  55577947622 Sep 10  2021 colabfold_envdb_202108_aln.tsv
-rw-rw-rw- 1 root root  38269691744 Feb 25 04:04 colabfold_envdb_202108_db
-rw-rw-rw- 1 root root            0 Feb 25 04:40 colabfold_envdb_202108_db.GPU_READY
-rw-rw-rw- 1 root root            4 Feb 25 04:19 colabfold_envdb_202108_db.dbtype
-rw-rw-rw- 1 root root 335125282816 Feb 25 07:23 colabfold_envdb_202108_db.idx
-rw-rw-rw- 1 root root            4 Feb 25 07:24 colabfold_envdb_202108_db.idx.dbtype
-rw-rw-rw- 1 root root          394 Feb 25 07:24 colabfold_envdb_202108_db.idx.index
-rw-rw-rw- 1 root root   5123470946 Feb 25 04:19 colabfold_envdb_202108_db.index
-rw-rw-rw- 1 root root   8201100965 Feb 25 04:40 colabfold_envdb_202108_db.lookup
-rw-rw-rw- 1 root root  48167910209 Feb 25 06:18 colabfold_envdb_202108_db_aln
-rw-rw-rw- 1 root root            4 Feb 25 06:20 colabfold_envdb_202108_db_aln.dbtype
-rw-rw-rw- 1 root root   5082428674 Feb 25 06:20 colabfold_envdb_202108_db_aln.index
-rw-rw-rw- 1 root root   9584855450 Feb 25 04:21 colabfold_envdb_202108_db_h
-rw-rw-rw- 1 root root            4 Feb 25 04:35 colabfold_envdb_202108_db_h.dbtype
-rw-rw-rw- 1 root root   4898131461 Feb 25 04:35 colabfold_envdb_202108_db_h.index
-rw-rw-rw- 1 root root 130858705931 Feb 25 02:56 colabfold_envdb_202108_db_seq
-rw-rw-rw- 1 root root            4 Feb 25 03:04 colabfold_envdb_202108_db_seq.dbtype
-rw-rw-rw- 1 root root  19118858828 Feb 25 03:04 colabfold_envdb_202108_db_seq.index
-rw-rw-rw- 1 root root  25108896515 Feb 25 03:18 colabfold_envdb_202108_db_seq_h
-rw-rw-rw- 1 root root            4 Feb 25 03:25 colabfold_envdb_202108_db_seq_h.dbtype
-rw-rw-rw- 1 root root  18036930897 Feb 25 03:25 colabfold_envdb_202108_db_seq_h.index
-rw-rw-rw- 1 root root  31646045634 Sep 13  2021 colabfold_envdb_202108_h.tsv
-rw-rw-rw- 1 root root 137395855050 Sep 13  2021 colabfold_envdb_202108_seq.tsv
drwxrwxrwx 4 root root      3007478 Feb 24 19:41 pdb
-rw-rw-rw- 1 root root     64929576 Feb 25 07:24 pdb100_230517
-rw-rw-rw- 1 root root            4 Feb 25 07:24 pdb100_230517.dbtype
-rw-rw-rw- 1 root root     28432889 Feb 24 19:32 pdb100_230517.fasta.gz
-rw-rw-rw- 1 root root    178491392 Feb 25 07:24 pdb100_230517.idx
-rw-rw-rw- 1 root root            4 Feb 25 07:24 pdb100_230517.idx.dbtype
-rw-rw-rw- 1 root root          275 Feb 25 07:24 pdb100_230517.idx.index
-rw-rw-rw- 1 root root      6082553 Feb 25 07:24 pdb100_230517.index
-rw-rw-rw- 1 root root      6715287 Feb 25 07:24 pdb100_230517.lookup
-rw-rw-rw- 1 root root     27989933 Feb 25 07:24 pdb100_230517_h
-rw-rw-rw- 1 root root            4 Feb 25 07:24 pdb100_230517_h.dbtype
-rw-rw-rw- 1 root root      6114340 Feb 25 07:24 pdb100_230517_h.index
-rw-rw-rw- 1 root root     27989933 Feb 25 07:24 pdb100_230517_tmp_h
-rw-rw-rw- 1 root root            4 Feb 25 07:24 pdb100_230517_tmp_h.dbtype
-rw-rw-rw- 1 root root      6116273 Feb 25 07:24 pdb100_230517_tmp_h.index
-rw-rw-rw- 1 root root  64064274015 Jun 13  2023 pdb100_a3m.ffdata
-rw-rw-rw- 1 root root      6389810 Jun 13  2023 pdb100_a3m.ffindex
-rw-rw-rw- 1 root root  19189110724 Feb 24 19:41 pdb100_foldseek_230517.tar.gz
drwxrwxrwx 3 root root      1000181 Feb 25 00:37 tmp1
drwxrwxrwx 3 root root      1000181 Feb 25 06:20 tmp2
drwxrwxrwx 3 root root      1000181 Feb 25 07:24 tmp3
-rw-rw-rw- 1 root root          337 May 22  2023 uniref30_2302.md5sum
-rw-rw-rw- 1 root root 102918187842 Feb 24 18:41 uniref30_2302.tar.gz
-rw-rw-rw- 1 root root   9071701972 May 16  2023 uniref30_2302.tsv
-rw-rw-rw- 1 root root  30961144274 May 16  2023 uniref30_2302_aln.tsv
-rw-rw-rw- 1 root root   8737721556 Feb 24 23:49 uniref30_2302_db
-rw-rw-rw- 1 root root            0 Feb 24 23:57 uniref30_2302_db.GPU_READY
-rw-rw-rw- 1 root root            4 Feb 24 23:52 uniref30_2302_db.dbtype
-rw-rw-rw- 1 root root 244437757952 Feb 25 01:22 uniref30_2302_db.idx
-rw-rw-rw- 1 root root            4 Feb 25 01:22 uniref30_2302_db.idx.dbtype
-rw-rw-rw- 1 root root          388 Feb 25 01:22 uniref30_2302_db.idx.index
lrwxrwxrwx 1 root root           24 Feb 25 01:22 uniref30_2302_db.idx_mapping -> uniref30_2302_db_mapping
lrwxrwxrwx 1 root root           25 Feb 25 01:22 uniref30_2302_db.idx_taxonomy -> uniref30_2302_db_taxonomy
-rw-rw-rw- 1 root root    833527570 Feb 24 23:52 uniref30_2302_db.index
-rw-rw-rw- 1 root root   1439030968 Feb 24 23:57 uniref30_2302_db.lookup
-rw-rw-rw- 1 root root  27652568723 Feb 25 00:37 uniref30_2302_db_aln
-rw-rw-rw- 1 root root            4 Feb 25 00:37 uniref30_2302_db_aln.dbtype
-rw-rw-rw- 1 root root    833745236 Feb 25 00:37 uniref30_2302_db_aln.index
-rw-rw-rw- 1 root root   4400678360 Feb 24 23:53 uniref30_2302_db_h
-rw-rw-rw- 1 root root            4 Feb 24 23:55 uniref30_2302_db_h.dbtype
-rw-rw-rw- 1 root root    849484835 Feb 24 23:55 uniref30_2302_db_h.index
-rw-rw-rw- 1 root root   5797891705 May 22  2023 uniref30_2302_db_mapping
-rw-rw-rw- 1 root root 134187960766 Feb 24 23:17 uniref30_2302_db_seq
-rw-rw-rw- 1 root root            4 Feb 24 23:21 uniref30_2302_db_seq.dbtype
-rw-rw-rw- 1 root root   9067667758 Feb 24 23:21 uniref30_2302_db_seq.index
-rw-rw-rw- 1 root root  43200163261 Feb 24 23:33 uniref30_2302_db_seq_h
-rw-rw-rw- 1 root root            4 Feb 24 23:36 uniref30_2302_db_seq_h.dbtype
-rw-rw-rw- 1 root root   8910693488 Feb 24 23:36 uniref30_2302_db_seq_h.index
-rw-rw-rw- 1 root root    667957493 May 22  2023 uniref30_2302_db_taxonomy
-rw-rw-rw- 1 root root  46247602628 May 16  2023 uniref30_2302_h.tsv
-rw-rw-rw- 1 root root 137235400133 May 16  2023 uniref30_2302_seq.tsv

@milot-mirdita
Copy link
Member

Did you call it with the GPU=1 env var (as described in the readme)? Please use the latest mmseqs to create the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants