Skip to content

Commit

Permalink
ggml : make GeLU faster and more accurate on CPU
Browse files Browse the repository at this point in the history
This change makes GeLU go 8x faster on AVX2, 3x faster on Apple Silicon,
and 2x faster on Threadripper. It is the world's most popular activation
function, used by models such as Whisper and Gemma, where it can lead to
a noticeable improvement in performance, because the GeLU op is the most
time-consuming usually of any operation except for matrix multiplication

In addition to improving performance this change also improves accuracy.
On ARM64 and AMD64 systems, we no longer need to rely on a 16-bit lookup
table. We're now using SIMD instead. The GeLU lookup table is still here
except it's been converted from fp16 to bf16. This helps align inference
more with training possibly, but it helps us avoid the two extra lookups
into the fp16 table. Therefore this change should have a positive impact
on performance for platforms like OpenPOWER and RISC-V too.
  • Loading branch information
jart committed Aug 5, 2024
1 parent bc0f887 commit 12e2ebc
Showing 1 changed file with 345 additions and 44 deletions.
Loading

0 comments on commit 12e2ebc

Please sign in to comment.