[SYCL]rm wait() to improve the performance #7233

arthw · 2024-05-12T03:21:41Z

This PR is used to revert the workaround solution in #5895.
That was a workaround to fix a known issue of oneMKL in Intel MTL Arc GPU.
Now, looks like the new oneMKL (oneAPI base toolkit 2024.1) is fixed the issue.
So, revert the old solution.

Now, we get the +32% in Intel MTL Arc GPU and +21% in Arc 770, tested with llama2-7b-Q4.

Next token:

MTL
7.06 tokens per second -> 9.37 tokens per second

Arc770
25.14 tokens per second ->30.50 tokens per second

airMeng · 2024-05-13T00:03:46Z

can you paste the absolute performance number here?

rm wait()

364c375

arthw requested a review from airMeng May 12, 2024 03:22

airMeng approved these changes May 13, 2024

View reviewed changes

airMeng merged commit 948f4ec into ggml-org:master May 13, 2024
58 checks passed

teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024

[SYCL] rm wait() (ggml-org#7233)

3fa36ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL]rm wait() to improve the performance #7233

[SYCL]rm wait() to improve the performance #7233

arthw commented May 12, 2024 •

edited by NeoZhangJianyu

Loading

airMeng commented May 13, 2024

[SYCL]rm wait() to improve the performance #7233

[SYCL]rm wait() to improve the performance #7233

Conversation

arthw commented May 12, 2024 • edited by NeoZhangJianyu Loading

airMeng commented May 13, 2024

arthw commented May 12, 2024 •

edited by NeoZhangJianyu

Loading