-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NTK RoPE scaling. #115
Comments
This sounds pretty good! But wondering how it would be implemented on exllama. compress_pos_emb is already a RoPE scaler. There's
But it seems to be used for training purposes. |
@Panchovix someone posted this code on 4chan, i haven't had the time to verify it as I'm on the move but maybe that's it. |
@alkeryn Thanks! It seems to work.
max_seq_len should be set as the same as you have to do with SuperHOT models (via -l)
And like this on model_init.py
|
Okay, I did a experimental PR to see if turbo wants to add it, or maybe testing it via other way. |
I'd like to see some results from finetuning before I go and add even more config options. If I built out ExLlama every time someone had an interesting idea on reddit it'd be an unmaintainable behemoth by now. It's already kind of unwieldy. |
so for using this feature, we should first tune the model with lora or whatever first? |
@laoda513 For NTK RoPE scaling, finetuning it is not needed. But based on my tests, superhot models works better with both RoPE scaling + comb scaling. For now, no loader supports NTK RoPE. That PR adds experimental supports only for exllama at the moment. |
@Panchovix i don't quite understand how it would work better with rope + comb scaling but that's interesting, so you put 4 for each ? |
I have tested the change and get better results with compression at 4 and alpha at for. Using TheBloke_nous-hermes-13b-superhot-8k-GPTQ-4bit-128g, if I only have either compression or NTK Rope enabled, it tells me it cannot find the secret messages I left embedded in the paper, but with alpha 4 and compression at 4 it retrieves correctly |
@ottobunge interesting, have you tried alpha 8 or more with no compression on a normal model ? |
trying alpha 10 and then alpha 4 compression 4 on this same model, to see differences |
@ottobunge that makes sense since the model was trained for 8k rope. |
That would be this |
I'm downloading a non fine tuned version, but on the fined tuned I can run no compression at alpha 10 and get good results. in fact it follows the formatting on the prompt better than compression 4 alpha 4 |
I have updated the PR. Before, the alpha value wasn't being applied correctly. (It was at 1.0) Now, it does it correctly, and thus, just by setting alpha for NTK RoPE scaling would be enough (without the need to set compress_pos_emb to the same value) @ottobunge @alkeryn Can you guys test and see how it goes now? Results are WAY different, and IMO, better. |
For tulu-30B-GPTQ (non-SuperHOT)
For Tulu-30B-SuperHOT-8K-4bit-32g:
Basically, it seems that NTK RoPE scaling is better that we expected. |
how about the mem cost increase for inference and training? it is linear? for example 1 for 2k and 2 for 4k.. and i think this is very exciting and interesting!When i think more on it, if we can easily extend a model trained with 2k to 8k. |
For training itself, sadly I'm not sure how it would be applied :(. Also, thanks turbo for the PR merge! Now NTK RoPE scaling can be used on exllama. |
thank you everyone, i'm closing the issue ! :) |
According to this post, this is a method of rope scaling that result in less perplexity loss and a bigger possible scaling:
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/
the code can be found in this notebook :
https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=b80b3f37
and the code for it seem to be a small change :
maybe it would be nice to add that option to exllama as well, with this technique finetuning for higher context may not even be necessary.
The text was updated successfully, but these errors were encountered: