-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create llamacpp_HF loader #3062
Conversation
The evaluation fails due to the following problem. The forward call logits = torch.tensor(self.model.model.eval_logits).view(1, 1, -1).to(kwargs['input_ids'].device) returns a tensor with shape I don't understand why the AutoGPTQ / transformers call returns a logit vector for each position in the input array, or how to reproduce that behavior. I'll tag @TheBloke because maybe he can help. |
Still experimental.
Motivation
Having a wrapper similar to ExLlama_HF but for llama.cpp. The main goal is to evaluate llama.cpp perplexity and compare it to AutoGPTQ/ExLlama directly. As a bonus, it will allow the transformers samplers to be used for llama.cpp.
How to use it
--loader llamacpp_HF
or by setting the loader to llamacpp_HF in the UI.Current status
Functional, but 3x slower:
llamacpp_HF:
llamacpp: