Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] replace get_work_group_size() by local cache for performance #8286

Merged

Conversation

NeoZhangJianyu
Copy link
Collaborator

  1. Use the cache in ggml_sycl_device_info() to replace function get_work_group_size() which has low performance.
  2. Rm function get_work_group_size().

@NeoZhangJianyu NeoZhangJianyu requested a review from joeatodd July 4, 2024 01:51
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 4, 2024
@NeoZhangJianyu NeoZhangJianyu requested a review from airMeng July 4, 2024 01:56
@airMeng
Copy link
Collaborator

airMeng commented Jul 4, 2024

Have you ran the UTs of norm and softmax on MTL/DG2?

@NeoZhangJianyu NeoZhangJianyu merged commit f09b7cb into ggerganov:master Jul 5, 2024
53 checks passed
@NeoZhangJianyu
Copy link
Collaborator Author

Have you ran the UTs of norm and softmax on MTL/DG2?

norm on Acr770 are passed.
softmax is not tested due to UT is broken by MUL_MAT with B16.

not test on MTL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants