[SYCL] replace get_work_group_size() by local cache for performance #8286

NeoZhangJianyu · 2024-07-04T01:51:31Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Use the cache in ggml_sycl_device_info() to replace function get_work_group_size() which has low performance.
Rm function get_work_group_size().

airMeng · 2024-07-04T01:58:15Z

Have you ran the UTs of norm and softmax on MTL/DG2?

NeoZhangJianyu · 2024-07-05T03:22:28Z

Have you ran the UTs of norm and softmax on MTL/DG2?

norm on Acr770 are passed.
softmax is not tested due to UT is broken by MUL_MAT with B16.

not test on MTL.

rm get_work_group_size() by local cache for performance

5dece9f

NeoZhangJianyu requested a review from joeatodd July 4, 2024 01:51

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 4, 2024

NeoZhangJianyu requested a review from airMeng July 4, 2024 01:56

airMeng approved these changes Jul 4, 2024

View reviewed changes

ngxson mentioned this pull request Jul 4, 2024

[SYCL] Caching device_info in device_ext to restore TG performance #8301

Closed

4 tasks

NeoZhangJianyu merged commit f09b7cb into ggerganov:master Jul 5, 2024
53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] replace get_work_group_size() by local cache for performance #8286

[SYCL] replace get_work_group_size() by local cache for performance #8286

NeoZhangJianyu commented Jul 4, 2024

airMeng commented Jul 4, 2024

NeoZhangJianyu commented Jul 5, 2024

[SYCL] replace get_work_group_size() by local cache for performance #8286

[SYCL] replace get_work_group_size() by local cache for performance #8286

Conversation

NeoZhangJianyu commented Jul 4, 2024

airMeng commented Jul 4, 2024

NeoZhangJianyu commented Jul 5, 2024