Create llamacpp_HF loader #3062

oobabooga · 2023-07-09T01:00:59Z

Still experimental.

Motivation

Having a wrapper similar to ExLlama_HF but for llama.cpp. The main goal is to evaluate llama.cpp perplexity and compare it to AutoGPTQ/ExLlama directly. As a bonus, it will allow the transformers samplers to be used for llama.cpp.

How to use it

Download oobabooga/llama-tokenizer using the UI or the download script:

python download-model.py oobabooga/llama-tokenizer

Load a ggml model with --loader llamacpp_HF or by setting the loader to llamacpp_HF in the UI.

Current status

Functional, but 3x slower:

llamacpp_HF:

Output generated in 15.79 seconds (12.60 tokens/s, 199 tokens, context 7, seed 313220017)

llamacpp:

Output generated in 5.50 seconds (36.39 tokens/s, 200 tokens, context 7, seed 1413621777)

oobabooga · 2023-07-09T04:14:24Z

The evaluation fails due to the following problem. The forward call

logits = torch.tensor(self.model.model.eval_logits).view(1, 1, -1).to(kwargs['input_ids'].device)

returns a tensor with shape torch.Size([1, 1, 32000]), while the evaluation code expects a tensor with shape torch.Size([1, 1200, 32000]), where the second number is whatever context size you chose.

I don't understand why the AutoGPTQ / transformers call returns a logit vector for each position in the input array, or how to reproduce that behavior. I'll tag @TheBloke because maybe he can help.

oobabooga added 5 commits July 8, 2023 17:50

Create a draft

7785dff

Sort imports

4b4d585

Make it functional

904f31b

Fix a basic bug

db72ba2

Handle the cache properly

76c765c

oobabooga mentioned this pull request Jul 9, 2023

Remove "first token must be BOS" restriction ggml-org/llama.cpp#2153

Merged

Trying to evaluate

e3eece6

oobabooga added the help wanted Extra attention is needed label Jul 13, 2023

oobabooga mentioned this pull request Jul 13, 2023

Make it possible to evaluate exllama perplexity #3138

Merged

Merge branch 'dev' into llamacpp_hf

7d4b743

oobabooga changed the base branch from main to dev July 13, 2023 23:18

oobabooga removed the help wanted Extra attention is needed label Jul 14, 2023

oobabooga added 5 commits July 14, 2023 10:19

Make evaluation functional

c9aba55

Decrease download timeout

0fff5ec

Clean up

05d6afb

Merge branch 'dev' into llamacpp_hf

17ddbf4

Unwanted changes

66ba79e

oobabooga merged commit 5e3f7e0 into dev Jul 16, 2023

oobabooga deleted the llamacpp_hf branch July 18, 2023 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create llamacpp_HF loader #3062

Create llamacpp_HF loader #3062

oobabooga commented Jul 9, 2023 •

edited

Loading

oobabooga commented Jul 9, 2023 •

edited

Loading

Create llamacpp_HF loader #3062

Create llamacpp_HF loader #3062

Conversation

oobabooga commented Jul 9, 2023 • edited Loading

Motivation

How to use it

Current status

oobabooga commented Jul 9, 2023 • edited Loading

oobabooga commented Jul 9, 2023 •

edited

Loading

oobabooga commented Jul 9, 2023 •

edited

Loading