-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add QuickGELU (lookup-table based) #561
Comments
Note, I meant to post this to https://github.com/FluxML/NNlib.jl but it might not matter (same people reading?), could move there, add here and/or there? I just followed some link and by accident ended up here. I think activations here may be legacy, or just most needed ones, so not wanted here? Since NNlib usable from Flux. Better there, then also from Lux etc.? FYI: Unrelated, while 4-bit quant is mainstream (or getting there, and 2-bit available, and 1-bit (Microsoft's BitNets), I also see this ("lossless" and "post-training quantization" I guess the pro over BitNets, that are not post-training):
https://github.com/SqueezeAILab/SqueezeLLM SqueezeLLM: Dense-and-Sparse Quantization
We want to support whatever quantization they have, at least such emerging? |
Motivation and description
See here:
https://github.com/ggerganov/ggml/pull/254/files
I think we may need QuickGELU, for compatibility, if not same as GELU, more than just optimization.
It's
probably just an optimization,it's a an approximation,but then why have both definitions there?https://zeta.apac.ai/en/latest/zeta/nn/modules/quickgeluactivation/
ggerganov/ggml#253
Also used with:
https://github.com/facebookresearch/MetaCLIP
They have two 128KB tables each for Float16 (but no table for ggml_gelu_quick_f32).
I thought lookup-tables went out of favor (for CPUs and GPUs), since faster to compute, but since not, most likely faster, at least in this case. I really don't think they would do this unless it really helped (I believe that's the most optimized and used library), at least for CPUs. So maybe consider also for other activation functions?
I'm not sure, probably lookup tables do not make sense on GPUs, since latency not as big of a deal, and threading compensates. I think the code there may only apply to CPUs. Can anyone confirm, or if also for GPUs?
Would it make sense to have a table for 8-bit floats too? And maybe to use it or some small table for Float16 with some extra computation?
I think I could implement this (in same way as there), i.e. the activations (so a starting point, not all of their use).
I also see there: "initialize GELU, Quick GELU, SILU and EXP F32 tables" I didn't think FP32 tables(!?) used, or for EXP, and also see unrelated GGML_OP_SILU_BACK and GGML_OP_ALIBI.
And FYI the 2016 GELU paper is updated in 2023 for some reason:
https://arxiv.org/abs/1606.08415
[v1] Mon, 27 Jun 2016 19:20:40 UTC (435 KB) [..]
[v3] Sun, 11 Nov 2018 07:40:32 UTC (3,013 KB) [..]
[v5] Tue, 6 Jun 2023 01:53:32 UTC (3,016 KB)
Possible Implementation
is:
The text was updated successfully, but these errors were encountered: