OpenLLaMA 3B support #1588

SlyEcho · 2023-05-24T18:37:54Z

~~Not working, perplexity around 2000.~~

The code doesn't crash and the model can be loaded.

ggml files: huggingface.co/SlyEcho/open_llama_3b_ggml

Source data here: huggingface.co/openlm-research/open_llama_3b_600bt_preview

More info: #1291

Perplexity on wiki.test.raw with -b 512 -c 512

Q	chunk	perplexity
F16	[616]	8.4656
Q8_0	[616]	8.4667
Q5_1	[616]	8.5072
Q5_0	[616]	8.5156
Q4_1	[616]	8.6102
Q4_0	[616]	8.6674

Not working, perplexity around 2000.

FNsi · 2023-05-25T06:21:51Z

I can confirm the following change can help to solve that.

1. n_mult=200,
2. Char buf [256] to char buf [200],
(In your file like line 299)

Update:
The ne still not be changed during step 2.😂

~~I was okay to running it only because I deleted the line lt.ne != ne. And it gives me reasonable responses no more than 3 sentences.~~

~~No clue but i'd like to find how how ne be calculated to 3200 x 8600, there must be something like int made.~~

Test with n_mult = 216
And another n_hand // 128 to 100,
All done well.

FNsi · 2023-05-25T06:25:53Z

ne should be 3200 * 8640;

Same as the

hidden_size * intermediate_size.

SlyEcho · 2023-05-25T08:38:45Z

Char buf [256] to char buf [200],
I don't think it needs to be a smaller buffer. (Although now I see two redundant strlen() calls there. But it is inconsequential in the grand scheme of things.)

n_mult=200,

Are you sure? How did you calculate this number?

When force it to load with n_ff = 8600 like your changes, it seems to be even more garbage.

To patch the model file without going through the reconversion you can use xxd like this to change n_mult:

# c8 = 200, 6c = 108
echo "0010: c8" | xxd -r - /models/open_llama_3b_preview_600bt_f16.bin

FNsi · 2023-05-25T08:41:23Z

Char buf [256] to char buf [200],

I don't think it needs to be a smaller buffer. (Although now I see two redundant strlen() calls there. But it is inconsequential in the grand scheme of things.)

n_mult=200,

Are you sure? How did you calculate this number?

When force it to load with n_ff = 8600 like your changes, it seems to be even more garbage.

To patch the model file without going through the reconversion you can use xxd like this to change n_mult:
# c8 = 200, 6c = 108

echo "0010: c8" | xxd -r - /models/open_llama_3b_preview_600bt_f16.bin

Sorry, I did n_mult = 216

And convert.py line 610
shape[1] // 128
should change to 100

Then all done.

SlyEcho · 2023-05-25T08:58:46Z

Sorry, I did n_mult = 216

It seems 216 and 108 have the same effect on calculating n_ff

And convert.py line 610
shape[1] // 128
should change to 100

Yes, this actually matters 👍

FNsi · 2023-05-25T09:02:11Z

Sorry, I did n_mult = 216

It seems 216 and 108 have the same effect on calculating n_ff

And convert.py line 610

shape[1] // 128

should change to 100

Yes, this actually matters 👍

my initially thinking like that number should close to 200, unless there's a reason to get sudden drop😹

SlyEcho · 2023-05-25T09:07:43Z

my initially thinking like that number should close to 200, unless there's a reason to get sudden drop😹

With 256 n_ff is 8600, I guess they wanted 8640, then 256-40 = 216. Maybe this is more logical, then

FNsi · 2023-05-25T09:10:52Z

my initially thinking like that number should close to 200, unless there's a reason to get sudden drop😹

With 256 n_ff is 8600, I guess they wanted 8640, then 256-40 = 216. Maybe this is more logical, then

Totally no clue about that, maybe you are right, even that not make any more sense...

SlyEcho · 2023-05-25T09:13:08Z

Totally no clue about that, maybe you are right, even that not make any more sense...

Ideally it should be read from the json file and saved in the model as n_ff, n_mult is not needed at all, otherwise.

Co-authored-by: FNsi <[email protected]>

FNsi · 2023-05-25T09:19:53Z

Totally no clue about that, maybe you are right, even that not make any more sense...

Ideally it should be read from the json file and saved in the model as n_ff, n_mult is not needed at all, otherwise.

I agree with that, might be another file type change is needed.

SlyEcho · 2023-05-25T09:21:41Z

I agree with that, might be another file type change is needed.

Ugh, I'd rather not. Maybe the field could be used for backward compat, so that if it is 256 then it means n_mult=256 else it is n_ff.

FNsi · 2023-05-25T09:25:27Z

Yep, might the 20b 40b 50b 120b llama come in the future...

Sovenok-Hacker

I think all ok

Sovenok-Hacker · 2023-05-25T11:27:18Z

Yep, might the 20b 40b 50b 120b llama come in the future...

Yes

SlyEcho · 2023-05-25T11:59:16Z

Perplexity done, in the description ↑

Sovenok-Hacker · 2023-05-25T12:49:20Z

I uploaded a working quantized version to HuggingFace

SlyEcho · 2023-05-25T13:27:52Z

and checksums as well:

0103204cb367a4ae78a6dcc107ee95a0f0f216e6d276082a534e0dc337dd7452  open_llama_3b_preview_600bt-q5_1.bin
878a64232542f174ecd41ca76f18b959cdf41944fb878b5cf6cb89ab264bd59b  open_llama_3b_preview_600bt-q4_0.bin
6e3b1e60f3135395bd32d8bb10388051c24b79bc5c0b5bc5e9cab11ebea253c3  open_llama_3b_preview_600bt-q4_1.bin
7ed15048e392ce43abae56668f8df6cb0f7f1d48e4c8e924a9fc58a82510e6ac  open_llama_3b_preview_600bt-q5_0.bin
d4d4f2425f355dd57cae7c6766bbd99cf482c8b374cbf775c230f1a8c038c617  open_llama_3b_preview_600bt-q8_0.bin
4461ccd289eed0190045fa79447262401fe432b63e6d9a7919637c420814e90b  open_llama_3b_preview_600bt-f16.bin

SlyEcho · 2023-05-25T13:38:28Z

I uploaded a working quantized version to HuggingFace

Confirmed to have same checksum as mine.

@Sovenok-Hacker, do you have the F16 version of the 350bt checkpoint available somewhere?

SlyEcho · 2023-05-26T20:02:53Z

Currently it seems to be working, the only problem I see right now is that convert.py does not load the correct parameters from the JSON file, but I don't really know enought about it to know how to change it.

FNsi · 2023-05-27T11:31:06Z

Some how I'd like this 3b, easy for training, seems another way to achieve unlimited context length...

SlyEcho · 2023-05-27T12:59:40Z

It seems like they trained it more on popular culture, it seems to understand Star Trek characters more than even LLaMa 30B: https://mdon.ee/@slyecho/110437699369998728

FNsi · 2023-05-27T13:49:09Z

As the team labelled, it use red Pajama-data-1t, same as the mpt-7B, and then I found mpt has a 1b version...

xingchensong · 2023-05-30T02:00:12Z

and checksums as well:

0103204cb367a4ae78a6dcc107ee95a0f0f216e6d276082a534e0dc337dd7452  open_llama_3b_preview_600bt-q5_1.bin
878a64232542f174ecd41ca76f18b959cdf41944fb878b5cf6cb89ab264bd59b  open_llama_3b_preview_600bt-q4_0.bin
6e3b1e60f3135395bd32d8bb10388051c24b79bc5c0b5bc5e9cab11ebea253c3  open_llama_3b_preview_600bt-q4_1.bin
7ed15048e392ce43abae56668f8df6cb0f7f1d48e4c8e924a9fc58a82510e6ac  open_llama_3b_preview_600bt-q5_0.bin
d4d4f2425f355dd57cae7c6766bbd99cf482c8b374cbf775c230f1a8c038c617  open_llama_3b_preview_600bt-q8_0.bin
4461ccd289eed0190045fa79447262401fe432b63e6d9a7919637c420814e90b  open_llama_3b_preview_600bt-f16.bin

Appreciate your great work， do you have all those ggml.bin available on HF or somewhere else?

SlyEcho · 2023-05-30T10:00:01Z

I put a link in the description with the files.

Green-Sky · 2023-05-30T10:31:25Z

The llama.cpp file changed look very merge-able.
But in convert.py we should load the value from the config.json file.

Would be cool if we have at least the llama.cpp changes merged before the finished 3B OpenLLaMa drops ("end of last week")

SlyEcho · 2023-05-30T13:36:08Z

The llama.cpp file changed look very merge-able.
But in convert.py we should load the value from the config.json file.

I can remove convert.py for now and create an issue to update it.

In the worst case we can just provide the model files for users right now.

convert.py

can i just use the changes view?

FNsi · 2023-05-30T13:45:19Z

The llama.cpp file changed look very merge-able.

But in convert.py we should load the value from the config.json file.

Would be cool if we have at least the llama.cpp changes merged before the finished 3B OpenLLaMa drops ("end of last week")

Somehow I think it's complex or will lead to use transformers 😅 is there a simple way to replace n_multi?

Sovenok-Hacker · 2023-05-30T15:31:42Z

I uploaded a working quantized version to HuggingFace

Confirmed to have same checksum as mine.

@Sovenok-Hacker, do you have the F16 version of the 350bt checkpoint available somewhere?

Yes

Green-Sky

quickly ran some tests (edit: i let them predict for >10min each)

q4_0
q5_1
f16

so I think this can be merged as is :)

Green-Sky · 2023-05-30T16:56:06Z

llama.cpp

@@ -58,6 +59,7 @@ static const size_t MB = 1024*1024;
 static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0()
 {
    static std::map<e_model, size_t> k_sizes = {
+        { MODEL_3B,    128ull * MB },


I was suspicious here, since its so much less. But it ran without any issue for me, so i guess the others might be too large. :)

I tested before merging with --no-mmap f16 model --memory-f32 -c 2048 -n 4096 which should be the worst case but it worked

SlyEcho · 2023-05-30T18:34:50Z

Just for reference, the diff for the converter script:

--- a/convert.py        2023-05-30 20:48:07.687486627 +0300
+++ b/convert.py        2023-05-30 20:47:55.854142065 +0300
@@ -143,12 +143,22 @@
     def guessed(model: 'LazyModel', file_type: GGMLFileType) -> 'Params':
         n_vocab, n_embd = model["tok_embeddings.weight"].shape

+        n_mult=256
+        n_head=n_embd // 128
+        n_layer=next(i for i in itertools.count() if f"layers.{i}.attention.wq.weight" not in model)
+
+        # TODO: hack for open_llama_3b
+        if n_embd == 3200:
+            n_mult = 216
+            n_head = 32
+            n_layer = 26
+
         return Params(
             n_vocab=n_vocab,
             n_embd=n_embd,
-            n_mult=256,
-            n_head=n_embd // 128,
-            n_layer=next(i for i in itertools.count() if f"layers.{i}.attention.wq.weight" not in model),
+            n_mult=n_mult,
+            n_head=n_head,
+            n_layer=n_layer,
             file_type=file_type,
         )

@@ -597,7 +607,9 @@
     out["norm.weight"] = model["model.norm.weight"]
     out["output.weight"] = model["lm_head.weight"]

-    n_head = model["model.layers.0.self_attn.q_proj.weight"].shape[1] // 128
+    # TODO: hack for open_llama_3b
+    n_embd = model["model.layers.0.self_attn.q_proj.weight"].shape[1]
+    n_head = 32 if n_embd == 3200 else n_embd // 128
     for i in itertools.count():
         if f"model.layers.{i}.self_attn.q_proj.weight" not in model:
             break

LostRuins · 2023-06-05T05:22:03Z

Hi @SlyEcho just noticed that the scratch buffers for 3B are a bit too small to use batch size 512. Suggest increasing them from 128MB to 256MB

SlyEcho · 2023-06-05T09:10:04Z

Alright, created new PR.

[wip] open_llama_3b support

ff99507

Not working, perplexity around 2000.

SlyEcho added the help wanted Extra attention is needed label May 24, 2023

SlyEcho mentioned this pull request May 24, 2023

Try whether OpenLLaMa works #1291

Closed

Fixes for model conversion

aeacc57

Co-authored-by: FNsi <[email protected]>

SlyEcho removed the help wanted Extra attention is needed label May 25, 2023

SlyEcho marked this pull request as ready for review May 25, 2023 09:16

Sovenok-Hacker approved these changes May 25, 2023

View reviewed changes

SlyEcho changed the title ~~[wip] OpenLLaMA 3B support~~ OpenLLaMA 3B support May 26, 2023

SlyEcho commented May 30, 2023

View reviewed changes

convert.py Outdated Show resolved Hide resolved

SlyEcho commented May 30, 2023

View reviewed changes

convert.py Outdated Show resolved Hide resolved

SlyEcho commented May 30, 2023

View reviewed changes

convert.py Outdated Show resolved Hide resolved

remove convert.py

2e84ad5

can i just use the changes view?

SlyEcho requested a review from Green-Sky May 30, 2023 13:44

Green-Sky approved these changes May 30, 2023

View reviewed changes

Green-Sky mentioned this pull request May 30, 2023

Train Text from scratch #1652

Merged

SlyEcho merged commit ffb06a3 into master May 30, 2023

Green-Sky added enhancement New feature or request model Model specific labels May 30, 2023

ggerganov deleted the open_llama_3b branch May 30, 2023 20:22

SlyEcho mentioned this pull request Jun 5, 2023

Increase 3B scratch buffers. #1698

Merged

BrickBee mentioned this pull request Jun 17, 2023

Shape Error When Running Inference after Converting OpenLlama 3B to GGML #1709

Closed

Green-Sky mentioned this pull request Jun 21, 2023

rework convert.py to read params from config to allow open_llama 3B #1958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenLLaMA 3B support #1588

OpenLLaMA 3B support #1588

SlyEcho commented May 24, 2023 •

edited

Loading

FNsi commented May 25, 2023 •

edited

Loading

FNsi commented May 25, 2023 •

edited

Loading

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023 •

edited

Loading

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023

SlyEcho commented May 25, 2023 •

edited

Loading

FNsi commented May 25, 2023

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023 •

edited

Loading

Sovenok-Hacker left a comment

Sovenok-Hacker commented May 25, 2023

SlyEcho commented May 25, 2023

Sovenok-Hacker commented May 25, 2023

SlyEcho commented May 25, 2023

SlyEcho commented May 25, 2023

SlyEcho commented May 26, 2023

FNsi commented May 27, 2023

SlyEcho commented May 27, 2023

FNsi commented May 27, 2023

xingchensong commented May 30, 2023

SlyEcho commented May 30, 2023

Green-Sky commented May 30, 2023

SlyEcho commented May 30, 2023

FNsi commented May 30, 2023

Sovenok-Hacker commented May 30, 2023

Green-Sky left a comment •

edited

Loading

Green-Sky May 30, 2023

SlyEcho May 31, 2023

SlyEcho commented May 30, 2023

LostRuins commented Jun 5, 2023

SlyEcho commented Jun 5, 2023

OpenLLaMA 3B support #1588

OpenLLaMA 3B support #1588

Conversation

SlyEcho commented May 24, 2023 • edited Loading

FNsi commented May 25, 2023 • edited Loading

FNsi commented May 25, 2023 • edited Loading

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023 • edited Loading

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023

SlyEcho commented May 25, 2023 • edited Loading

FNsi commented May 25, 2023

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023

SlyEcho commented May 25, 2023

FNsi commented May 25, 2023 • edited Loading

Sovenok-Hacker left a comment

Choose a reason for hiding this comment

Sovenok-Hacker commented May 25, 2023

SlyEcho commented May 25, 2023

Sovenok-Hacker commented May 25, 2023

SlyEcho commented May 25, 2023

SlyEcho commented May 25, 2023

SlyEcho commented May 26, 2023

FNsi commented May 27, 2023

SlyEcho commented May 27, 2023

FNsi commented May 27, 2023

xingchensong commented May 30, 2023

SlyEcho commented May 30, 2023

Green-Sky commented May 30, 2023

SlyEcho commented May 30, 2023

FNsi commented May 30, 2023

Sovenok-Hacker commented May 30, 2023

Green-Sky left a comment • edited Loading

Choose a reason for hiding this comment

Green-Sky May 30, 2023

Choose a reason for hiding this comment

SlyEcho May 31, 2023

Choose a reason for hiding this comment

SlyEcho commented May 30, 2023

LostRuins commented Jun 5, 2023

SlyEcho commented Jun 5, 2023

SlyEcho commented May 24, 2023 •

edited

Loading

FNsi commented May 25, 2023 •

edited

Loading

FNsi commented May 25, 2023 •

edited

Loading

FNsi commented May 25, 2023 •

edited

Loading

SlyEcho commented May 25, 2023 •

edited

Loading

FNsi commented May 25, 2023 •

edited

Loading

Green-Sky left a comment •

edited

Loading