[window11]平台下现在是否无法使用GPU进行指令精调 #565

AceyKubbo · 2023-06-11T16:28:18Z

详细描述问题

目前因为deepspeed和nccl两个库无法在win上使用,所以现在在win平台下进行训练是不可行的吗?

参考信息

依赖情况（代码类问题务必提供）

Package	Version
transformers	4.29.1
torch	2.0.1+cu118
peft	0.3.0.dev0

运行日志或截图

Traceback (most recent call last):
  File "E:\pyCode\Chinese-LLaMA-Alpaca\scripts\training\run_clm_sft_with_peft.py", line 468, in <module>
    main()
  File "E:\pyCode\Chinese-LLaMA-Alpaca\scripts\training\run_clm_sft_with_peft.py", line 205, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "D:\Python310\lib\site-packages\transformers\hf_argparser.py", line 346, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 118, in __init__
  File "D:\Python310\lib\site-packages\transformers\training_args.py", line 1333, in __post_init__
    and (self.device.type != "cuda")
  File "D:\Python310\lib\site-packages\transformers\training_args.py", line 1697, in device
    return self._setup_devices
  File "D:\Python310\lib\site-packages\transformers\utils\generic.py", line 54, in __get__
    cached = self.fget(obj)
  File "D:\Python310\lib\site-packages\transformers\training_args.py", line 1631, in _setup_devices
    self.distributed_state = PartialState(backend=self.ddp_backend)
  File "D:\Python310\lib\site-packages\accelerate\state.py", line 143, in __init__
    torch.distributed.init_process_group(backend=self.backend, **kwargs)
  File "D:\Python310\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
  File "D:\Python310\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

必查项目

基础模型：Alpaca-Plus-7B
运行系统：Windows
问题分类：模型训练与精调
模型正确性检查：务必检查模型的SHA256.md，模型不对的情况下无法保证效果和正常运行。
（必选）由于相关依赖频繁更新，请确保按照Wiki中的相关步骤执行
（必选）我已阅读FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案
（必选）第三方插件问题：例如llama.cpp、text-generation-webui、LlamaChat等，同时建议到对应的项目中查找解决方案

The text was updated successfully, but these errors were encountered:

ymcui · 2023-06-12T00:16:33Z

建议你去查询一下这两个库在windows下的安装，我们没有在windows下训练这些模型，无法提供帮助。

AceyKubbo · 2023-06-12T01:44:09Z

建议你去查询一下这两个库在windows下的安装，我们没有在windows下训练这些模型，无法提供帮助。

deepspeed在issues看到了,搞笑的竟然不支持自家OS

去掉deepspeed选项,尝试nccl也是,issues中提到正在开发2.0,到时候再看看能不能支持win

目前看来只能先装个linux虚拟机来进行训练了

AceyKubbo · 2023-06-14T16:18:25Z

解决方案可以考虑wsl,window自带的linux子系统,这样也可以在window下跑,虽然是虚拟机,但是比vm感觉还是方便很多

AceyKubbo closed this as completed Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[window11]平台下现在是否无法使用GPU进行指令精调 #565

[window11]平台下现在是否无法使用GPU进行指令精调 #565

AceyKubbo commented Jun 11, 2023 •

edited

Loading

ymcui commented Jun 12, 2023

AceyKubbo commented Jun 12, 2023 •

edited

Loading

AceyKubbo commented Jun 14, 2023

[window11]平台下现在是否无法使用GPU进行指令精调 #565

[window11]平台下现在是否无法使用GPU进行指令精调 #565

Comments

AceyKubbo commented Jun 11, 2023 • edited Loading

详细描述问题

参考信息

依赖情况（代码类问题务必提供）

运行日志或截图

必查项目

ymcui commented Jun 12, 2023

AceyKubbo commented Jun 12, 2023 • edited Loading

AceyKubbo commented Jun 14, 2023

AceyKubbo commented Jun 11, 2023 •

edited

Loading

AceyKubbo commented Jun 12, 2023 •

edited

Loading