Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dygraph] refine dygraph backward error info #70709

Conversation

wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented Jan 8, 2025

PR Category

Execute Infrastructure

PR Types

Improvements

Description

动态图反向报错信息优化。

  1. loss.backward()时,尽可能提供GradNode.name()的信息,让用户知道崩溃所在位置。
  2. export FLAGS_call_stack_level=3,可以在loss.backward()崩溃时提供前向堆栈信息。

以如下case报错为例:

import paddle
def hook_fn(grad):
    grad = grad * 2
    raise ValueError("test")
    return grad

x = paddle.to_tensor([0., 1., 2., 3.], stop_gradient=False)
y = paddle.to_tensor([4., 5., 6., 7.], stop_gradient=False)
z = paddle.to_tensor([1., 2., 3., 4.])
w = x + y
w.register_hook(hook_fn)
o = z.matmul(w)
o.backward()

在开启FLAGS_call_stack_level=3情况下,报错信息如下:

W0108 08:46:32.082053 25409 backward.cc:436] While running Node (AddGradNode) raises an EnforceNotMet exception
Traceback (most recent call last):
  File "/data/Eager/Paddle2/test.py", line 13, in <module>
    o.backward()
  File "/usr/local/lib/python3.9/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/data/Eager/Paddle2/build/python/paddle/base/wrapped_decorator.py", line 40, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/data/Eager/Paddle2/build/python/paddle/base/framework.py", line 704, in __impl__
    return func(*args, **kwargs)
  File "/data/Eager/Paddle2/build/python/paddle/base/dygraph/tensor_patch_methods.py", line 357, in backward
    core.eager.run_backward([self], grad_tensor, retain_graph)
OSError: 

  Forward Traceback (most recent call last):
    File "/data/Eager/Paddle2/test.py", line 10, in <module>
    w = x + y


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   egr::Backward(std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, bool)
1   egr::RunBackward(std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, bool, bool, std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, bool, std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&)
2   AddGradNode::operator()(paddle::small_vector<std::vector<paddle::Tensor, std::allocator<paddle::Tensor> >, 15u>&, bool, bool)
3   egr::GradNodeBase::ApplyGradientHooks(paddle::small_vector<std::vector<paddle::Tensor, std::allocator<paddle::Tensor> >, 15u> const&)
4   common::enforce::EnforceNotMet::EnforceNotMet(common::ErrorSummary const&, char const*, int)
5   common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
ExternalError: ValueError: test

At:
  /data/Eager/Paddle2/test.py(4): hook_fn
  /data/Eager/Paddle2/build/python/paddle/base/dygraph/tensor_patch_methods.py(357): backward
  /data/Eager/Paddle2/build/python/paddle/base/framework.py(704): __impl__
  /data/Eager/Paddle2/build/python/paddle/base/wrapped_decorator.py(40): __impl__
  /usr/local/lib/python3.9/dist-packages/decorator.py(232): fun
  /data/Eager/Paddle2/test.py(13): <module>

  [Hint: res should not be null.] (at /data/Eager/Paddle2/paddle/fluid/pybind/eager_utils.cc:2546)
  [GradNode < AddGradNode > error]

GradNodeAccumulation由于用法很多,统计的前向栈可能不准确。

Pcard-67164

Copy link

paddle-bot bot commented Jan 8, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@wanghuancoder wanghuancoder changed the title Refine dygraph backward error info [dygraph] refine dygraph backward error info Jan 8, 2025
Copy link
Contributor

@xiaoguoguo626807 xiaoguoguo626807 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_python_stack() 需要按经验在需要的代码里加?

@wanghuancoder wanghuancoder merged commit 8d12057 into PaddlePaddle:develop Jan 9, 2025
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants