Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize gradient for silu a bit #2393

Merged
merged 1 commit into from
Aug 4, 2024

Conversation

MilkFather
Copy link
Contributor

This pull request exploits some mathematical properties of the silu operation. By reusing the computed forward pass results as much as possible, the optimized code reduces the running time for a sliver of a millisecond.

Ideally, since the forward pass also calculates the sigmoid of the input, if we could cache that intermediate result somehow, we can avoid computing that again, further cutting down some time.

By the way, the type of silu_grad is Result<Tensor, Error>, yet the compiler still accepts the code. Perhaps we don't need so many ? operators after all.

@LaurentMazare
Copy link
Collaborator

Thanks. Re ? there are some hacks that make it possible not too use them when using binary operators, it's still a bit unclear to me whether it's a good or a bad thing as it's a bit less explicit which bit can fail but it's also a lot less verbose, so the statu quo is to let whoever writes the code decides which style they prefer.

@LaurentMazare LaurentMazare merged commit c0a559d into huggingface:main Aug 4, 2024
10 checks passed
EricLBuehler pushed a commit to EricLBuehler/candle that referenced this pull request Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants