Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL]rm wait() to improve the performance #7233

Merged
merged 1 commit into from
May 13, 2024
Merged

Conversation

arthw
Copy link
Collaborator

@arthw arthw commented May 12, 2024

This PR is used to revert the workaround solution in #5895.
That was a workaround to fix a known issue of oneMKL in Intel MTL Arc GPU.
Now, looks like the new oneMKL (oneAPI base toolkit 2024.1) is fixed the issue.
So, revert the old solution.

Now, we get the +32% in Intel MTL Arc GPU and +21% in Arc 770, tested with llama2-7b-Q4.

Next token:

MTL
7.06 tokens per second -> 9.37 tokens per second

Arc770
25.14 tokens per second ->30.50 tokens per second

@arthw arthw requested a review from airMeng May 12, 2024 03:22
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level ggml changes relating to the ggml tensor library for machine learning Review Complexity : High Generally require indepth knowledge of LLMs or GPUs and removed Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 12, 2024
@airMeng
Copy link
Collaborator

airMeng commented May 13, 2024

can you paste the absolute performance number here?

@airMeng airMeng merged commit 948f4ec into ggml-org:master May 13, 2024
58 checks passed
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants