Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create llamacpp_HF loader #3062

Merged
merged 12 commits into from
Jul 16, 2023
Merged

Create llamacpp_HF loader #3062

merged 12 commits into from
Jul 16, 2023

Conversation

oobabooga
Copy link
Owner

@oobabooga oobabooga commented Jul 9, 2023

Still experimental.

Motivation

Having a wrapper similar to ExLlama_HF but for llama.cpp. The main goal is to evaluate llama.cpp perplexity and compare it to AutoGPTQ/ExLlama directly. As a bonus, it will allow the transformers samplers to be used for llama.cpp.

How to use it

  1. Download oobabooga/llama-tokenizer using the UI or the download script:
python download-model.py oobabooga/llama-tokenizer
  1. Load a ggml model with --loader llamacpp_HF or by setting the loader to llamacpp_HF in the UI.

Current status

Functional, but 3x slower:

llamacpp_HF:

Output generated in 15.79 seconds (12.60 tokens/s, 199 tokens, context 7, seed 313220017)

llamacpp:

Output generated in 5.50 seconds (36.39 tokens/s, 200 tokens, context 7, seed 1413621777)

@oobabooga
Copy link
Owner Author

oobabooga commented Jul 9, 2023

The evaluation fails due to the following problem. The forward call

logits = torch.tensor(self.model.model.eval_logits).view(1, 1, -1).to(kwargs['input_ids'].device)

returns a tensor with shape torch.Size([1, 1, 32000]), while the evaluation code expects a tensor with shape torch.Size([1, 1200, 32000]), where the second number is whatever context size you chose.

I don't understand why the AutoGPTQ / transformers call returns a logit vector for each position in the input array, or how to reproduce that behavior. I'll tag @TheBloke because maybe he can help.

@oobabooga oobabooga added the help wanted Extra attention is needed label Jul 13, 2023
@oobabooga oobabooga changed the base branch from main to dev July 13, 2023 23:18
@oobabooga oobabooga removed the help wanted Extra attention is needed label Jul 14, 2023
@oobabooga oobabooga merged commit 5e3f7e0 into dev Jul 16, 2023
@oobabooga oobabooga deleted the llamacpp_hf branch July 18, 2023 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant