You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can tell from the source code, this activation doesn't need to cache values to calculate gradients since it recalculates the forward pass during the backwards pass: https://github.com/thomasbrandon/mish-cuda/blob/master/csrc/mish.h#L26
Is this an accurate statement? I'm sorry if this is dumb, I haven't written any c++ pytorch code so I'm not sure how their API works for caching activations.
The text was updated successfully, but these errors were encountered:
As far as I can tell from the source code, this activation doesn't need to cache values to calculate gradients since it recalculates the forward pass during the backwards pass: https://github.com/thomasbrandon/mish-cuda/blob/master/csrc/mish.h#L26
Is this an accurate statement? I'm sorry if this is dumb, I haven't written any c++ pytorch code so I'm not sure how their API works for caching activations.
The text was updated successfully, but these errors were encountered: