BLOOM Refactor #141

danforbes · 2023-04-16T13:48:07Z

Code refactor and BLOOM model implementation. This is an attempt to take the best parts of #85 from @hhamud and #125 from @iacore.

@hhamud

Modified code by @hhamud

philpax · 2023-04-16T14:51:14Z

Nice work! I really like the traits, I think this is definitely the direction to go (although I suspect there'll be some change here as we accommodate more eccentric models - but that's OK!)

Here's the feedback I had while looking through:

Rename llama-rs to llm-base, so it's clear what the relationship between bloom, llama and llm-base is. If possible, try to avoid users of bloom and llama having to bring in llm-base themselves (i.e. re-export any relevant types).
Expose a llm crate that includes both bloom and llama. I'd suggest feature-gating these, but specifying them as default features.
Consolidate the two CLIs into llm-cli, as a client crate to llm.
It doesn't seem like you've used most of the code from Standalone loader #125 ?
That PR introduces a more generic way to load models of any GGML format, and handles the other formats. It's not done yet, but it results in less code duplication and should be more versatile.
The Model::Model associated type is redundant, no? That would just be Self.
Model::Layer / the Layer types shouldn't be public, they're an implementation detail of the model
Document the changes made to GGML, to make it clearer that we've patched it with some new code (in CREDITS.txt)
The fields of InferenceSession, Vocabulary and other similar types were crate-public to ensure that users couldn't accidentally meddle/depend on the internals. Is making them public necessary?
Prefer using &dyn Model instead of &impl Model; monomorphisation is unnecessary here and comes at a cost if you use multiple models.

Can you also credit @hhamud and @iacore in your PR description?

danforbes · 2023-04-16T14:55:12Z

Thank you very much for the encouragement and detailed feedback! TBH I'm not a very experienced Rust programmer, nor do I know a ton about LLMs, so I'm very happy for the suggestions. I wasn't able to make a ton of sense out of #125, so I may need some more details there, but let me see what I can do.

philpax · 2023-04-16T14:59:57Z

It's possible that we can get this going as-is, and then use BLOOM as the secondary test case for #125. Let's see how this looks after we've gotten through the other feedback 🚀

danforbes · 2023-04-16T15:35:35Z

Okay, I think I have made as many changes as I feel comfortable with...thanks again for the great suggestions! Other contributors can feel free to add their own commits to this branch (just let me know if you need perms). Some thoughts:

I've tested this a little locally and inference for both BLOOM and LLaMA models actually seems messed up 😅 There is text produced, but it is garbled.
I renamed llama-rs to llm-base, but I'm not sure if I re-exported the right types.
I didn't feel comfortable making the changes to the loader, which I think precludes me from moving forward with your suggestions around the llm and llm-cli crates.
I'm not sure what sort of changes needs to be made to CREDITS.txt
When I tried using dyn instead of impl for Model it wanted me to couple with the associated type(s), so I wasn't sure if I knew how to implement that properly.

Thanks again and I'm happy to take another stab at more improvements if you have more suggestions 💯

hhamud · 2023-04-16T18:56:24Z

I've tested this a little locally and inference for both BLOOM and LLaMA models actually seems messed up 😅 There is text produced, but it is garbled.

You can try to print out the logit vectors of both the models and compare them with the original implementations in C with llama.cpp and bloom.cpp, or even just test the outputs of the models using the same prompts and If they are approximate enough it should be fine.

Though I wonder if it is possible to implement a CI that downloads weights and measures the differences in vector output from the same prompt, (cosine similarity?) and passes if it is within a specified range.

danforbes · 2023-04-16T19:26:36Z

@hhamud FWIW, I also received garbled output with a LLaMA 13B model on the current main branch, so I don't necessarily think it's a regression...I'm not sure if it's an existing bug or a user-error.

philpax · 2023-04-18T07:32:32Z

Sorry about the delay on addressing your latest updates, I'll have a look after work soon!

philpax · 2023-04-19T23:28:47Z

Hey! If all goes well, #125 will be landing soon. I'll take care of handling the merge into this PR in the next day or two and addressing the feedback if that's OK with you?

danforbes · 2023-04-20T02:57:44Z

All good 💯

…/bloom

KerfuffleV2 · 2023-04-25T07:14:53Z

@danforbes Hey, just wondering. Did you already try to get a pull request with your alibi operation accepted upstream to the ggml or llama.cpp repo?

There's been some related discussion over here: 6b2765d#commitcomment-110389270

This disables BLOOM support for now.

KerfuffleV2 · 2023-04-26T18:57:59Z

I have also started

Better watch out, you're on the path to having an attention span as short as mi—

TBH the code was written by @hhamud. Maybe they would care to merge upstream?

It would be a lot nicer to get it merged than to have to maintain a custom GGML version indefinitely. Also, the llama.cpp/ggml repo gets a lot of pull requests so it's reasonably likely it could take a couple weeks to get something merged even if it was started today.

If no one else wants to do it, I can (once it's definitely working) but it would be better if the people who were actually involved were the ones who submitted the pull. (Not just because I'm lazy, but probably at least 50% that reason.) edit: Currently a lot on my plate, I probably won't have the time to take something like this on.

philpax · 2023-04-26T20:10:20Z

@philpax I have also started a branch for GPT2/Cerebras loading and inference. ~~It crashes loading the vocabulary ATM.~~ However, it produces garbled text.

Nice work! I would suggest not racing too far ahead with the model support, I'd like to make sure all of the ones we have now work well and share as much as they can. (Especially around the actual inference logic - I wonder how much can be shared?)

That being said, if it gets done and it works, it would be nice to support. Keep us posted 🚀

It would be a lot nicer to get it merged than to have to maintain a custom GGML version indefinitely. Also, the llama.cpp/ggml repo gets a lot of pull requests so it's reasonably likely it could take a couple weeks to get something merged even if it was started today.

If no one else wants to do it, I can (once it's definitely working) but it would be better if the people who were actually involved were the ones who submitted the pull. (Not just because I'm lazy, but probably at least 50% that reason.)

Yeah, I agree. The functions came from bloomz.cpp, which is not being developed at present, so I don't think there's any immediate pressure from anyone to get it into GGML.

If you want to tackle it, that'd be appreciated, but I think we're all just as lazy as you 😅

llm-base/src/loader.rs

danforbes · 2023-04-27T01:08:14Z

Keep us posted 🚀

I think I've gotten it working...the problem was a silly user error when calculating the position tensor during evaluation. Let me know if you'd like me to pull the gpt crate into this PR.

philpax · 2023-04-27T01:23:54Z

I'd suggest opening a separate PR, just so that we can focus on getting this one in first.

danforbes · 2023-04-27T03:24:27Z

For BLOOM evaluation, there is some difference here: https://github.com/NouamaneTazi/bloomz.cpp/blob/main/main.cpp#L667-L676 ➡️ https://github.com/danforbes/llama-rs/blob/dfo/model/bloom/bloom/src/lib.rs#L303-L312

KerfuffleV2 · 2023-04-27T07:58:31Z

Does this currently run? I think I figured out a way to do with without a patched GGML (assuming the alibi op is the only thing that was patched).

What you could do is generate a F32 tensor with the (j + 1) * m_k calculation at https://github.com/rustformers/llama-rs/blob/c608b4b81b53fe331989c896af7b60df23bdc28e/ggml-sys/ggml/ggml.c#L8651 and just add it to the other one. This wouldn't even require the map operations. Or am I missing something?

philpax · 2023-04-27T08:32:57Z

For BLOOM evaluation, there is some difference here: https://github.com/NouamaneTazi/bloomz.cpp/blob/main/main.cpp#L667-L676 ➡️ https://github.com/danforbes/llama-rs/blob/dfo/model/bloom/bloom/src/lib.rs#L303-L312

Hm, I know that we've messed around with that a bit in the LLaMA model, but I don't think it should cause the behaviour we're seeing here?

The best thing to do would be to run inference for a single token with both bloom and bloomz.cpp and then compare the logits, but that'll require some more concerned debugging.

Does this currently run?

It runs, but does not produce any output.

Or am I missing something?

Wouldn't know - didn't try! That sounds reasonable. The alibi op was indeed the only thing that was patched - if it can be replicated without a new operation, that'd be awesome.

You could try reimplementing it and seeing if the logits match what's produced now? If they do, we can switch to that / remove the patched GGML and then get back to figuring out what the issue is.

I bet the issue's something incredibly simple; I just haven't put the debugging time in.

KerfuffleV2 · 2023-04-27T08:38:47Z

It runs, but does not produce any output.

I can't really test a different version of the alibi op if it doesn't produce output. :)

Making something that runs but produces no output is probably pretty easy even without correctly reproducing that operation's functionality...

So I'll wait until it's actually functioning before I try to do that.

philpax · 2023-04-27T08:41:36Z

Yes, that's why I suggested comparing the logits 😉

As long as the logits are the same, the overall computation should be identical, and we can continue as-is to identify the issue. If it isn't, then we know that it isn't a drop-in replacement.

That being said, no stress if you want to wait until it's properly done.

philpax · 2023-04-27T12:11:40Z

Just leaving this as a note for either me or you to deal with: rename Model -> KnownModel, ErasedModel -> Model

@philpax

See discussion here rustformers/llm#141 (comment) CC: @philpax @KerfuffleV2

danforbes · 2023-04-27T16:58:43Z

Hm, I know that we've messed around with that a bit in the LLaMA model, but I don't think it should cause the behaviour we're seeing here?

This is the only difference I was able to find between the implementation in this repository and what is currently in the main branch for bloomz.cpp. I have added some commented out code that indicates what happens when I try to bring this codebase inline with the upstream...I've tried a few different permutations to get things to work, but I wasn't having too much luck.

https://github.com/danforbes/llama-rs/blob/e35f93b642b6e1fc9975c22de938b08451d77d97/bloom/src/lib.rs#L302-L320

danforbes · 2023-04-27T17:52:02Z

@philpax would you be opposed to me doing some refactoring with respect to the ggml crates? I'd like to combine them into one crate and split up the current ggml/src/lib.rs file into context.rs, tensor.rs, etc.

philpax · 2023-04-27T18:47:17Z

Keep ggml-sys separate, but the rest is all yours - I was planning on merging the loader into the ggml crate myself, so if you want to get a start on that, go for it!

philpax · 2023-04-27T18:48:01Z

Hm, I know that we've messed around with that a bit in the LLaMA model, but I don't think it should cause the behaviour we're seeing here?

This is the only difference I was able to find between the implementation in this repository and what is currently in the main branch for bloomz.cpp. I have added some commented out code that indicates what happens when I try to bring this codebase inline with the upstream...I've tried a few different permutations to get things to work, but I wasn't having too much luck.

https://github.com/danforbes/llama-rs/blob/e35f93b642b6e1fc9975c22de938b08451d77d97/bloom/src/lib.rs#L302-L320

Were you able to get bloomz.cpp working with the same model?

danforbes · 2023-04-27T19:06:24Z

Were you able to get bloomz.cpp working with the same model?

Yes and I tried to compare the logits but I got overwhelmed 😅 Are you opposed to maybe jumping on a Jitsi or some other kind of web conference sometime to work on this together? I feel like we could get a lot accomplished that way...

danforbes · 2023-04-30T14:31:23Z

@philpax I'd like to make a suggestion to get this across the line. What if we pull out the non-functional BLOOM crate and replace it with the GPT2 implementation I have sitting in another branch? There's been a lot of good development on this branch and I think it's high-time we merge it in. I'd also suggest pulling in the GGML refactor since @setzer22 seems on board with the approach I've taken. What do you think?

danforbes · 2023-04-30T15:45:47Z

Superseded by #162

danforbes added 3 commits April 16, 2023 06:21

Create a Model trait

d84aa7f

Bloom model

e0713a1

Modified code by @hhamud

cargo fmt

6bfda75

danforbes added 5 commits April 16, 2023 08:06

Rename llama-rs to llm-base

73f59c3

Clippy

e670c25

Remove redundant associated Model type from Model trait

c4b4176

Remove associated Layer type from Model trait

1cf305f

cargo fmt

0d4dde9

danforbes added 2 commits April 16, 2023 09:07

Docs

849c28d

Tests and examples

54ad890

danforbes force-pushed the dfo/model/bloom branch from e07eda5 to 54ad890 Compare April 16, 2023 17:27

philpax mentioned this pull request Apr 16, 2023

Support for Dolly models #142

Closed

Layers are private

4ba7c1c

danforbes mentioned this pull request Apr 17, 2023

Bloom model integration guillaume-be/rust-bert#276

Open

philpax mentioned this pull request Apr 19, 2023

Standalone loader #125

Merged

philpax mentioned this pull request Apr 20, 2023

Support for BLOOM #45

Closed

hhamud mentioned this pull request Apr 20, 2023

WIP: Bloom Inference #85

Closed

Merge branch 'main' of github.com:rustformers/llama-rs into dfo/model…

dcf85ff

…/bloom

philpax added 2 commits April 26, 2023 00:27

Merge branch 'main' into dfo/model/bloom

43ecac1

This disables BLOOM support for now.

Fix build

440bd69

Add launch.json for debugging loading and inference

813bdd1

philpax mentioned this pull request Apr 26, 2023

A tool to convert multipart models to singlepart models #150

Closed

danforbes commented Apr 27, 2023

View reviewed changes

llm-base/src/loader.rs Outdated Show resolved Hide resolved

Merge branch 'main' into dfo/model/bloom

c608b4b

philpax referenced this pull request Apr 27, 2023

Sync to GGML version as of 2023-04-23 02:17 UTC

6b2765d

Check tensor dimensions when loading

e19418c

danforbes added a commit to danforbes/ggml that referenced this pull request Apr 27, 2023

alibi implementation by @hhamud

5afcd3b

See discussion here rustformers/llm#141 (comment) CC: @philpax @KerfuffleV2

danforbes mentioned this pull request Apr 27, 2023

alibi implementation by @hhamud ggerganov/ggml#113

Merged

Model -> KnownModel, ErasedModel -> Model

e35f93b

Merge branch 'main' into dfo/model/bloom

288df7f

danforbes mentioned this pull request Apr 30, 2023

Structural Overhaul #162

Merged

danforbes closed this Apr 30, 2023

danforbes deleted the dfo/model/bloom branch April 30, 2023 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLOOM Refactor #141

BLOOM Refactor #141

danforbes commented Apr 16, 2023 •

edited

Loading

philpax commented Apr 16, 2023 •

edited

Loading

danforbes commented Apr 16, 2023

philpax commented Apr 16, 2023

danforbes commented Apr 16, 2023

hhamud commented Apr 16, 2023 •

edited

Loading

danforbes commented Apr 16, 2023

philpax commented Apr 18, 2023

philpax commented Apr 19, 2023

danforbes commented Apr 20, 2023

KerfuffleV2 commented Apr 25, 2023

KerfuffleV2 commented Apr 26, 2023 •

edited

Loading

philpax commented Apr 26, 2023

danforbes commented Apr 27, 2023

philpax commented Apr 27, 2023

danforbes commented Apr 27, 2023

KerfuffleV2 commented Apr 27, 2023

philpax commented Apr 27, 2023

KerfuffleV2 commented Apr 27, 2023

philpax commented Apr 27, 2023

philpax commented Apr 27, 2023

danforbes commented Apr 27, 2023

danforbes commented Apr 27, 2023

philpax commented Apr 27, 2023

philpax commented Apr 27, 2023

danforbes commented Apr 27, 2023

danforbes commented Apr 30, 2023

danforbes commented Apr 30, 2023

BLOOM Refactor #141

BLOOM Refactor #141

Conversation

danforbes commented Apr 16, 2023 • edited Loading

philpax commented Apr 16, 2023 • edited Loading

danforbes commented Apr 16, 2023

philpax commented Apr 16, 2023

danforbes commented Apr 16, 2023

hhamud commented Apr 16, 2023 • edited Loading

danforbes commented Apr 16, 2023

philpax commented Apr 18, 2023

philpax commented Apr 19, 2023

danforbes commented Apr 20, 2023

KerfuffleV2 commented Apr 25, 2023

KerfuffleV2 commented Apr 26, 2023 • edited Loading

philpax commented Apr 26, 2023

danforbes commented Apr 27, 2023

philpax commented Apr 27, 2023

danforbes commented Apr 27, 2023

KerfuffleV2 commented Apr 27, 2023

philpax commented Apr 27, 2023

KerfuffleV2 commented Apr 27, 2023

philpax commented Apr 27, 2023

philpax commented Apr 27, 2023

danforbes commented Apr 27, 2023

danforbes commented Apr 27, 2023

philpax commented Apr 27, 2023

philpax commented Apr 27, 2023

danforbes commented Apr 27, 2023

danforbes commented Apr 30, 2023

danforbes commented Apr 30, 2023

danforbes commented Apr 16, 2023 •

edited

Loading

philpax commented Apr 16, 2023 •

edited

Loading

hhamud commented Apr 16, 2023 •

edited

Loading

KerfuffleV2 commented Apr 26, 2023 •

edited

Loading