Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

你好,我用您的代码跑不管是multi还是默认的都无法自动部署在多卡上,请问需要调整代码解决吗,GPU是Tesla T4*4 #14

Open
carlson99999 opened this issue Nov 22, 2023 · 3 comments

Comments

@carlson99999
Copy link

No description provided.

@xxw1995
Copy link
Owner

xxw1995 commented Nov 23, 2023

目前finetune.py是不支持多卡的 multi版本严格意义上说也不是多卡 虽然能调用多张GPU训练 但其实是手动映射了device 如果需要多卡训练需要把代码修改成torchrun分布式

@carlson99999
Copy link
Author

好的,谢谢,目前用multi起来了,但是跑到一半服务器断网了,请问有什么办法从output/checkpoint续跑

@xxw1995
Copy link
Owner

xxw1995 commented Jan 2, 2024

好的,谢谢,目前用multi起来了,但是跑到一半服务器断网了,请问有什么办法从output/checkpoint续跑

从checkpoint中用peft的方式加载最新的lora.pt训练即可。目前已支持deepspeed。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants