-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[fix] missing input log higher order. #15331
Conversation
@mxnet-label-bot add [Operator, Backend] |
I'm still not sure what's the meaning of the backward output for the head gradient input as we discussed before. This week we are at a conference so we might be slow to respond. I'm not sure how to test this, I think I would need to dump the graph and think about it, as I'm not sure now in which python variable is the gradient of the head gradient stored. I think the PR fixes the issue though. Would the operator had failed on a division without argument? looks like the tests don't execute the Op or? Would it be better to set those outputs to zero since we don't know how to use them? I'm fine with the fix proposed in this PR though. |
@larroy Those outputs are needed for 3rd order and above gradients. |
@kshitij12345 https://github.com/apache/incubator-mxnet/pull/15331/files#diff-0dad60704ce39e602a1907aec6835375R1121 comment should actually be dL/dygrad. Could you please update it as well? |
@sxjscience Do we have a use case where the gradient on the gradient of output y is needed? |
Please update the comment, otherwise LGTM |
https://github.com/apache/incubator-mxnet/blob/5171e1d92cfc5eefa2c20dfe8ac3fac5351ad19a/src/operator/tensor/elemwise_unary_op_basic.cc#L1120 @larroy @apeforest , I was also wondering if we can check the number of inputs passed at compile time? I have observed the |
I think ograd[0] is dL/dx_grad About the number of inputs, you are right that we could check. If it's more than one or two function calls I think is too much overhead and it's going to get caught with the python tests, Also if you don't return enough gradients, there's a check after calls to fgradient, so I think is not a big deal. Up to you if you can come up with something concise. |
@@ -1117,15 +1117,15 @@ MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU_DR(_backward_log10, | |||
unary_bwd<mshadow_op::log10_grad>) | |||
.set_attr<nnvm::FGradient>("FGradient", | |||
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) { | |||
// ograds[0]: dL/dxgrad | |||
// ograds[0]: dL/dygrad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is dL/dx_grad. The head gradient is the gradient with respect to the previous output right? the previous output is x_grad or dL/dx so this thing is dL/(dL/dx) or dL/dx_grad in lack of a better notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it should be, dL/dy_grad as we are computing/returning dL/dx_grad,
Eg.
y = f(dx_grad)
L = g(y) # dx_grad formed part of the network and affected loss
During backprop by chain rule,
dL/dx_grad = dL/dy * dy/dx_grad
In comments, we have called dL/dy (mentioned in the above example) as dL/dy_grad
That is why we have,
https://github.com/apache/incubator-mxnet/blob/5b95fb3ee3581ba20fe1def336621d68a811e17f/src/operator/tensor/elemwise_unary_op_basic.cc#L1111-L1112
These multiplications performing,
dL/dx_grad = dL/dy * dy/dx_grad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the notation is complicating us in excess as it gets pretty hairy. It's the head gradient of the previous (output) node, which has shape of x, and x_grad. So it has to be related to x, not y.
I think in Lagrange notation it would be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I get it now. If I understand it correctly then, crudely ograds[0]
is how much does the x_grad
affect the L
and then we compute how does x_grad
change with x
. Makes sense now.
Thank you very much. Will reflect it in this and other PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kshitij12345 I think what you write makes sense. I'm also unsure about notations, maybe you can come with a better one. If not maybe we leave the comment out, so we can merge the PR, as the code seems to do what's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thanks Again.
// inputs[0]: dL/dy | ||
// inputs[1]: x | ||
// inputs[1]: x (ElemewiseGradUseIn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment, helps.
…to fix/missing-input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…to fix/missing-input
@kshitij12345 Could you please resolve the merge conflict and ping the reviewers again to merge it? Thanks! |
Sure. Thanks |
@larroy @apeforest Gentle ping for review. |
@apeforest @larroy Gentle Ping. |
I guess we need to add a test case. |
@sxjscience I am not sure about how to test this. I was expecting that this missing input will cause problem with computing of higher order. Tried for 3rd order, which was computed successfully. For fourth order, |
@sxjscience @apeforest @larroy Gentle Ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry we are all quite busy. I think it's fine to merge this. We can do any additional refinements later.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Sorry for the delayed response. We have been extremely busy in the past month.
@apeforest Sure no worries. Thanks. |
@larroy Thank You very much for catching this.
Sorry for the silly mistake.
Can we have a way to test this?
@apeforest @larroy