-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
静态图AMP O2策略load checkpoint有bug #39050
Comments
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~ Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day! |
paddle静态图AMP O2训练产生的checkpoint,目前已支持save master weight,可以使用如下方法:
动态图目前尚未支持save master weight,已有计划支持。#39121 |
Since you haven't replied for more than a year, we have closed this issue/pr. |
为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:
1)PaddlePaddle版本:develop
2)CPU:无
3)GPU:无
4)系统环境:无
注:您可以通过执行summary_env.py获取以上信息。
1)单机/多机,单卡/多卡
2)显存信息
3)Operator信息
在静态图AMP O2策略中,startup program里会插入一个
cast
op来把param转成FP32的master param。假如我现在有个checkpoint想加载到模型里,无论在amp_init
前还是后load checkpoint,都会有bug:amp_init
在load checkpoint之前:amp_init
在load checkpoint之后由此可见,无论load checkpoint放在什么位置,都是错的,无法正常加载checkpoint。
The text was updated successfully, but these errors were encountered: