Optimize gradient for silu a bit #2393

MilkFather · 2024-08-03T15:50:27Z

This pull request exploits some mathematical properties of the silu operation. By reusing the computed forward pass results as much as possible, the optimized code reduces the running time for a sliver of a millisecond.

Ideally, since the forward pass also calculates the sigmoid of the input, if we could cache that intermediate result somehow, we can avoid computing that again, further cutting down some time.

By the way, the type of silu_grad is Result<Tensor, Error>, yet the compiler still accepts the code. Perhaps we don't need so many ? operators after all.

LaurentMazare · 2024-08-04T09:24:14Z

Thanks. Re ? there are some hacks that make it possible not too use them when using binary operators, it's still a bit unclear to me whether it's a good or a bad thing as it's a bit less explicit which bit can fail but it's also a lot less verbose, so the statu quo is to let whoever writes the code decides which style they prefer.

optimize gradient for silu a bit

ab84d23

LaurentMazare merged commit c0a559d into huggingface:main Aug 4, 2024
10 checks passed

EricLBuehler pushed a commit to EricLBuehler/candle that referenced this pull request Aug 14, 2024

optimize gradient for silu a bit (huggingface#2393)

0f55c37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize gradient for silu a bit #2393

Optimize gradient for silu a bit #2393

MilkFather commented Aug 3, 2024

LaurentMazare commented Aug 4, 2024

Optimize gradient for silu a bit #2393

Optimize gradient for silu a bit #2393

Conversation

MilkFather commented Aug 3, 2024

LaurentMazare commented Aug 4, 2024