--hf-num-gpus参数不生效 #1609

yating0823 · 2024-10-15T05:33:12Z

yating0823
Oct 15, 2024

使用OpenCompass，基于hf_internlm2_5_1_8b_chat模型，做long_bench数据集的评测。发现--hf-num-gpus参数不生效。
服务器的环境配置是：4090*4 *28GB
命令：CUDA_VISIBLE_DEVICES=0,1,2,3 opencompass --models hf_internlm2_5_1_8b_chat --datasets longbench_passage_retrieval_zh_gen_01cca2 --batch-size 2 --hf-num-gpus 1
会报CUDA OOM Error
在opencompass/opencompass/runners/local.py这个文件中，submit这个函数中加了log，发现gpu_ids = np.where(gpus)[0][:num_gpus]这段代码拿到的gpu_ids为空。
在查看了#1457 这个贴子后，发现可以直接在opencompass/configs/models/hf_internlm/hf_internlm2_5_1_8b_chat.py文件中通过设置run_cfg这个参数来控制模型推理时使用的显卡数量。于是将run_cfg设置为：run_cfg=dict(num_gpus=4, num_procs=1)
后执行命令：opencompass --models hf_internlm2_5_1_8b_chat --datasets longbench_passage_retrieval_zh_gen_01cca2
gpu_ids = np.where(gpus)[0][:num_gpus]就可以拿到正常的gpu_ids了

我不是很明白，是我第一次执行的命令有问题吗？另外，OpenCompass还有个命令参数是--max-num-worker，看了ReadME文档，这个参数是做数据并行。这个参数和batch_size有关系吗？和--hf-num-gpus这个参数的关系又是什么呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--hf-num-gpus参数不生效 #1609

{{title}}

Replies: 0 comments

Select a reply

--hf-num-gpus参数不生效 #1609

yating0823 Oct 15, 2024

Replies: 0 comments

yating0823
Oct 15, 2024