-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
May 2024 Binary Update (Take 2) #712
May 2024 Binary Update (Take 2) #712
Conversation
New spell checker in action! |
|
In my case Llama3 doesn´t work properly. This is the output with the example and 70b parameters:
The same test with 8B parameters:
|
@SignalRT Were you using the newly added LLaMA3 example or other common examples? |
@AsakusaRinne The new example. I will try to identify the problem. |
@SignalRT did you download a new llama3 file? Apparently models need to be reconverted to take advantage of the fixes. |
When I get the problem I update the model and downloaded https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main If there is a model link that works for you, I could try it. |
There's one linked from here which should have the fixes: https://www.reddit.com/r/LocalLLaMA/comments/1cg3e8k/llama_3_8b_instruct_with_fixed_bpe_tokenizer/ |
@SignalRT I tried this 8B model and it worked for me with the newly added LLaMA3 example. However I have no idea about the 70B model. |
Tests are passing on Windows CUDA 12. |
Test pass on MacOS. Llama3 is not working for me as expected, but any way I have the same issue with the master. Llava seems to work in my first test, but asking about more images it seems it doesn't work properly. |
Is this a regression? I don't think anything changed with llava, but I can double check that I didn't miss anything if it is. |
I'm running into the same issues, it does seem to work, just not as good as master. I keep getting things like:
There are no donuts in any of my images by the way. It also keeps mentioning "patterns", "glitches" and "artifacts". |
There's no change to llava.h between the current version in master (https://github.com/ggerganov/llama.cpp/tree/f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7/examples/llava) and the version in this PR (https://github.com/ggerganov/llama.cpp/tree/b8c1476e44cc1f3a1811613f65251cf779067636/examples/llava) :/ |
After reviewing the status of llama.cpp master it seems that there is a problem: ggerganov/llama.cpp#7060 related with this PR: ggerganov/llama.cpp#6899 with the introduction of moondream vision |
I upgraded some projects to this PR and Windows OpenCL is also working without issues. 👍 I have been trying to reproduce the llava issue in The outputs in llamasharp do not match Edit
And
Both have the exact same issue. For example:
And a completely different image:
@SignalRT the issue you linked definitely seems related, especially the things mentioned here and here match what I'm running in to, but that doesn't explain why they get it after "a few" responses while I get it on the first response in llamasharp. Perhaps this binary update should not include that specific commit with moondream vision support (if we can revert it), or we should wait with merging this PR until their fix lands? To avoid a llamasharp release with broken llava. |
We don't have a way to remove a specific commit from llama.cpp in the build system, and I don't really want to have to add that! We'll have to wait until a fix lands upstream. If anyone sees a PR in llama.cpp that fixes this, please link it here. |
@martindevans , this problem should be solved after the revert of clip.cpp changes. |
- llama.cpp: a743d76a01f23038b2c85af1e9048ee836767b44 - https://github.com/SciSharp/LLamaSharp/actions/runs/9017784838
@SignalRT @m0nsky @AsakusaRinne this PR has been updated with a new set of binaries. If you guys could test it out again that'd be appreciated! |
LLava issues are solved, no more donuts. 🍩 Ran unit tests again on CUDA12 just in case and all tests passed. |
@martindevans Llava is working fine again with this binaries. The test pass also OK. |
@martindevans Could you please push an arbitrary commit to trigger the benchmark test? And I'll test it on my PC tonight. |
…pattern they should all use.
All things work well on Windows11+cuda11, Linux+cpu, Linux+cuda11 and Linux+cuda12. However there's problem in WSL2 with cuda11. Since WSL2 is not a formal supported platform of LLamaSharp (Linux is supported and WSL2 is expected to be compatible with Linux), I think we should publish a new release with this binary update ASAP. Instead I'll open an issue for the WSL2 problem. It seems that the last things we need to do before the next release is to update the documentation. |
A note to remember my latest test: With the llama.cpp binary version currently in this branch we don't support moondream multi-dimensional model (because they revert the PR to solve llava problems). That problems were fixed in ggerganov/llama.cpp@d11afd6 and after that commit I checked that both llava and moondream work properly in LlamaSharp. |
This PR has been updated with a new set of binaries, which should hopefully fix the llava issue.
Testing:
Previous version, for reference. No longer relevant.