-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when will baichuan2 be supported? #3270
Comments
the following message was displayed: CUDA error 9 at D:\llama.cpp\ggml-cuda.cu:6517: invalid configuration argument |
If the first model will load, and there are no architectural changes to the second model, then it should load in cpu mode. |
sth changed , but not much |
What do you see? I'm interested because we are trying to fix Aquila here and I think I have heard of Baichuan being mentioned in the same context (@KerfuffleV2 : any ideas?). |
Sorry, I don't know. I've heard of Baichun but never messed with it. There was a recent pull that was supposed to add support for Baichun models in general: #3009 I guess Baichuan 2 is different and wouldn't have been included in that. @dansinboy Did you try what BarfingLemurs said and try to run it in pure CPU mode? |
I have tried to use convert-baichuan-hf-to-gguf.py to convert Baichuan2-7B-Chat to gguf format, it is successful, and then I tried to quantize the model, it is successful. But both of these two models can't loaded. The error at the end:
I'm looking forward that this will be supported, thanks. |
Looks like an issue with the vocab size? We have: |
Please check out #3299 and see if that fixes your issue. Also if anyone can test other Baichuan models like Baichuan1 that would be appreciated. I converted the 7b base model. It seems to work now:
edit: Also, yikes. What's with that story? Maybe he deserved to die for having such an uncreative name. |
@KerfuffleV2 : We are checking at the same time :) For me the following commands seem to work without any change:
Same goes for Edit: obvious question: does |
I actually didn't even try with the plain Can anyone confirm that Baichuan(2) is exactly the same architecture as LLaMA? edit: It has its own I'm not sure what the implications are of converting the Baichuan models as if they're LLaMA. Presumably someone at some point thought they needed to be handled differently (Baichuan1 at least). If that's not the case then we can rip out a bunch of special case stuff in edit: Also, apparently you can convert Baichuan2 to Baichuan1 pretty easily: https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#migrating-inference-optimizations-from-baichuan-1-to-baichuan-2 |
Here is some documentation I found.
Interesting. Edit: I diff'd |
I grabbed the 13B base version: https://huggingface.co/baichuan-inc/Baichuan2-13B-Base
The dedicated conversion script did convert it, however it doesn't actually work properly. With the prompt "从前有一只小狐狸" it just repeats "一只大老虎。" (a big tiger) forever. Don't think it matters what the prompt is, it gets stuck repeating the same thing. It's not just nonsense though, which is interesting. I thought it might have been because of
I think using the convert to Baichuan1 thing I mentioned above might make it work, but I didn't get a chance to try that yet. Doing: for name in model_part.keys():
data = model_part[name]
if name == 'lm_head.weight':
print('>>> Normalizing lm_head.weight')
data = torch.nn.functional.normalize(data) does not seem to have any effect. No idea what's wrong at this point. According to: https://github.com/baichuan-inc/Baichuan-13B/blob/main/README_EN.md#model-details For Baichuan1, the 13B uses ALiBi instead of RoPE. Might be the same in Baichuan2, the result does look like an attention problem so I can believe this is the issue. I guess this would mean it's not really a problem with conversion but with how llama.cpp handles the graph. It seems like there is code to use ALiBi for 13B though: Lines 2975 to 2983 in 324f340
edit: Last edit, I promise. I tried hacking |
@KerfuffleV2 : 7B's
but 13B's contains only
So I'm now trying with
in |
I'm pretty sure the context size saved in GGUF is purely cosmetic. You can set |
The model (Baichuan2-13B-Base) converts via
and @KerfuffleV2 's observation. The model runs but output looks way worse than 7B (pretty repetitive) to me, so maybe there is another deviation lurking. I also think we shouldn't derive
But we are talking about conversion, not inference? |
Yeah, but maybe not correctly. Forcing it to act like a 7B made it repeat in a somewhat different way, so I'm pretty sure it was using ALiBi as expected.
"Pretty repetitive" or literally just repeating a word or short phrase forever?
Well, at conversion time the context size is just used to populate the context size field when it gets saved as GGUF. So it doesn't really make a difference in either place. I get that the conversion script crashes because it couldn't find it, but other than that I'm just saying you could set it to whatever random value you want and it won't actually matter. |
Depending on temperature from 'pretty' to 'literally' it seemed from a few tests.
Ah, I probably could use the |
@KerfuffleV2 after trying many times, I finally successfully loaded the model to GPU. the steps: 1: normalize the lm_head_w in Baichuan2 in order to change in to Baichuan1
2: use convert-baichuan-hf-to-gguf.py to convert. but, but , but can only offload max 32 layers to GPU, if set -npl 35, errros occurred. it is wired. |
Are you saying you don't have the problem with it being repetitive? Like if you run:
it doesn't repeat?
Are you sure you're not running out of VRAM? |
I found this paper: https://arxiv.org/pdf/2309.10305.pdf , but I'm not working on this field.
but when I try to load this model errors occurred:
It is successfull, and I can successfully load this model
Finally, this is a lovely model.
when I try to load this model, got this error:
Conversion failed, errors like this:
I really wish I knew how to fix it, but it's a shame. I hope this is helpful. |
I was mistaken, but using the earlier change
in `convert.py´ and setting
(in |
I haven't had a chance to mess with it, but have you seen #3290? Seems like that's an effort to replace/extend
I'm not jameswu but if the fix is just setting the architecture and checking for the context size in an extra field then certainly that's very simple. I can just update the pull I already have open to do that.
It works better than with the baichuan-specific conversion script? In other words, it doesn't have the repetition issue when converted with |
Looks like a thin wrapper for conversion and quantization using the existing infrastructure.
I'm only testing in English. Here a short example for
But I'm really not sure how good this is.
Tried and failed to compare the scripts: they are too different. Edit: if I'd have some more time I'd check perplexity for these models and add some more tokenizer tests (beautiful |
This is actually pretty weird. I don't think converting with With temperature 0, it seems like it always gets stuck repeating the same thing even with English. Using the default temperature ( It's much worse about getting stuck with Chinese but I can get similar rambling output if I set the temperature higher, to Something seems like it has to be wrong here, whether it's at conversion or inference time isn't clear though. I really thought normalizing the layer like the convert to baichuan1 thing said was going to help, but as far as I can see it just has no effect whatsoever. Do you know what happens when layer norm is applied multiple times? I feel like it's probably just a no-op in that case, so maybe inference is already just running that operation on it which is why performing it at conversion time makes no difference. I might be wrong though. |
mark |
You'll need to use #3299 to convert (or manually make changes to
|
It seems like I was wrong about the converted Baichuan2 13B model not working properly (using #3299 to be clear, not current Should work now. Please leave a comment if you still have issues. |
same problem. can not offload layers > 32 . |
i think it is the best model in Chinese. i tried llama.cpp to run baichuan2, but failed
The text was updated successfully, but these errors were encountered: