-
Notifications
You must be signed in to change notification settings - Fork 369
Conversation
Nice work! I really like the traits, I think this is definitely the direction to go (although I suspect there'll be some change here as we accommodate more eccentric models - but that's OK!) Here's the feedback I had while looking through:
Can you also credit @hhamud and @iacore in your PR description? |
Thank you very much for the encouragement and detailed feedback! TBH I'm not a very experienced Rust programmer, nor do I know a ton about LLMs, so I'm very happy for the suggestions. I wasn't able to make a ton of sense out of #125, so I may need some more details there, but let me see what I can do. |
It's possible that we can get this going as-is, and then use BLOOM as the secondary test case for #125. Let's see how this looks after we've gotten through the other feedback 🚀 |
Okay, I think I have made as many changes as I feel comfortable with...thanks again for the great suggestions! Other contributors can feel free to add their own commits to this branch (just let me know if you need perms). Some thoughts:
Thanks again and I'm happy to take another stab at more improvements if you have more suggestions 💯 |
e07eda5
to
54ad890
Compare
You can try to print out the logit vectors of both the models and compare them with the original implementations in C with llama.cpp and bloom.cpp, or even just test the outputs of the models using the same prompts and If they are approximate enough it should be fine. Though I wonder if it is possible to implement a CI that downloads weights and measures the differences in vector output from the same prompt, (cosine similarity?) and passes if it is within a specified range. |
@hhamud FWIW, I also received garbled output with a LLaMA 13B model on the current |
Sorry about the delay on addressing your latest updates, I'll have a look after work soon! |
Hey! If all goes well, #125 will be landing soon. I'll take care of handling the merge into this PR in the next day or two and addressing the feedback if that's OK with you? |
All good 💯 |
@danforbes Hey, just wondering. Did you already try to get a pull request with your alibi operation accepted upstream to the ggml or llama.cpp repo? There's been some related discussion over here: 6b2765d#commitcomment-110389270 |
This disables BLOOM support for now.
Better watch out, you're on the path to having an attention span as short as mi—
It would be a lot nicer to get it merged than to have to maintain a custom GGML version indefinitely. Also, the llama.cpp/ggml repo gets a lot of pull requests so it's reasonably likely it could take a couple weeks to get something merged even if it was started today.
|
Nice work! I would suggest not racing too far ahead with the model support, I'd like to make sure all of the ones we have now work well and share as much as they can. (Especially around the actual inference logic - I wonder how much can be shared?) That being said, if it gets done and it works, it would be nice to support. Keep us posted 🚀
Yeah, I agree. The functions came from If you want to tackle it, that'd be appreciated, but I think we're all just as lazy as you 😅 |
I think I've gotten it working...the problem was a silly user error when calculating the position tensor during evaluation. Let me know if you'd like me to pull the |
I'd suggest opening a separate PR, just so that we can focus on getting this one in first. |
For BLOOM evaluation, there is some difference here: https://github.com/NouamaneTazi/bloomz.cpp/blob/main/main.cpp#L667-L676 ➡️ https://github.com/danforbes/llama-rs/blob/dfo/model/bloom/bloom/src/lib.rs#L303-L312 |
Does this currently run? I think I figured out a way to do with without a patched GGML (assuming the What you could do is generate a |
Hm, I know that we've messed around with that a bit in the LLaMA model, but I don't think it should cause the behaviour we're seeing here? The best thing to do would be to run inference for a single token with both
It runs, but does not produce any output.
Wouldn't know - didn't try! That sounds reasonable. The You could try reimplementing it and seeing if the logits match what's produced now? If they do, we can switch to that / remove the patched GGML and then get back to figuring out what the issue is. I bet the issue's something incredibly simple; I just haven't put the debugging time in. |
I can't really test a different version of the alibi op if it doesn't produce output. :) Making something that runs but produces no output is probably pretty easy even without correctly reproducing that operation's functionality... So I'll wait until it's actually functioning before I try to do that. |
Yes, that's why I suggested comparing the logits 😉 As long as the logits are the same, the overall computation should be identical, and we can continue as-is to identify the issue. If it isn't, then we know that it isn't a drop-in replacement. That being said, no stress if you want to wait until it's properly done. |
Just leaving this as a note for either me or you to deal with: rename |
See discussion here rustformers/llm#141 (comment) CC: @philpax @KerfuffleV2
This is the only difference I was able to find between the implementation in this repository and what is currently in the main branch for bloomz.cpp. I have added some commented out code that indicates what happens when I try to bring this codebase inline with the upstream...I've tried a few different permutations to get things to work, but I wasn't having too much luck. |
@philpax would you be opposed to me doing some refactoring with respect to the |
Keep |
Were you able to get bloomz.cpp working with the same model? |
Yes and I tried to compare the logits but I got overwhelmed 😅 Are you opposed to maybe jumping on a Jitsi or some other kind of web conference sometime to work on this together? I feel like we could get a lot accomplished that way... |
@philpax I'd like to make a suggestion to get this across the line. What if we pull out the non-functional BLOOM crate and replace it with the GPT2 implementation I have sitting in another branch? There's been a lot of good development on this branch and I think it's high-time we merge it in. I'd also suggest pulling in the GGML refactor since @setzer22 seems on board with the approach I've taken. What do you think? |
Superseded by #162 |
Code refactor and BLOOM model implementation. This is an attempt to take the best parts of #85 from @hhamud and #125 from @iacore.