Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement backward computation for more operators #921

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Ronsor
Copy link
Contributor

@Ronsor Ronsor commented Aug 12, 2024

This PR will add backward computations for most operators once completed.

  • Tanh
  • Sigmoid
  • GELU + GELU (quick)
  • ELU
  • clamp
  • LeakyReLU
  • mean
  • concat

Leaving pad, im2col, and norm for a future PR now.

Currently unsure if I should fuse the multiply + gradient computation for gelu_back/gelu_quick_back like with silu_back.

Ronsor added 9 commits August 12, 2024 12:58
We use the following formulas to compute the gradients:

Let g be `tensor->grad`, let x be `src0`, and let y be `tensor`.
For tanh, `g * (1 - tanh^2(x)) = g * (1 - y^2) = g - gy^2`.
For sigmoid, `g * (sigmoid(x) * (1 - sigmoid(x))) = g * (y * (1 - y)) = gy - gy^2`.
This comes with a breaking change: `ggml_clamp` is no longer an in-place
operation. If you still want/need that behavior, use `ggml_clamp_inplace`.
I hope no one depended on that.

Also introduces `GGML_OP_CLAMP_BACK`, whose implementations for other
backends will be added in a subsequent commit.

The definition of `clamp_back` is as follows:
                           { 0 if x < min
d/dx(clamp(x, min, max)) = { 1 if min <= x <= max
                           { 0 if x > max
Slice the gradient using a view operation, reshape, and then add
to the inputs' gradients.
Introduces `GGML_UNARY_OP_ELU_BACK`, defined as the following:

ELU'(x) = { e^x if x <= 0
          { x   if x > 0
d/dx(LeakyRELU(x, negative_slope)) = { 1              if x > 0
                                     { negative_slope if x <= 0

The equivalent formula `negative_slope * step(-x) + step(x)` is used
for backward computation.
…GELU_BACK`

Introduces corresponding `*_BACK` operators for both. Backend-specific accelerated
implementations forthcoming.
@Ronsor Ronsor marked this pull request as ready for review August 14, 2024 02:31
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add tests to tests/test-grad0.cpp

src/ggml.c Outdated Show resolved Hide resolved
@JohannesGaessler
Copy link
Collaborator

I'm currently working on adding training support for the MNIST example in #908 . I have a working backward pass for im2col and pool2d (the ops needed for the convolutional neural network). I'm currently working on cleaning up the code and putting it into a state that can be reviewed. When I added tests to test-grad0 I also added a fix to deal with noncontinuous gradients when numerically calculating the gradients to compare against backpropagation; this fix or an equivalent one will also be needed for clamp.

d/dx(ELU(x)) is 1 if x >= 0, not x
@ggerganov
Copy link
Owner

It might be better to wait for @JohannesGaessler to merge #908 and then continue this PR?

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 16, 2024

That's probably best, considering the changes needed for the tests.

@JohannesGaessler
Copy link
Collaborator

I extended the code in test-backend-ops to enable checking gradients from backpropagation against numerically calculated gradients. New tests for gradients should be implemented there if possible (the only thing that currently doesn't work is support for FP16). In principle all that should be necessary is to add ggml_set_param to the existing tests (though tuning the parameters in such a way that you get good numerical precision for the reference values can be tricky).

@Ronsor
Copy link
Contributor Author

Ronsor commented Sep 3, 2024

Perfect. I plan to finish this PR this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants