[SYCL] Align GEMM dispatch #7566

airMeng · 2024-05-27T13:13:31Z

The issue is exposed during #6408

I will split #6408 into several smaller PRs for eazy reviewing, the tasks will be updated according to issues exposed

GEMM aligning with CUDA backends
remove all global variables and pass the context instead
leave the async copy to common backend
separate GEMM files outside of ggml-sycl.c for maintainability.

joeatodd

This is a welcome addition since it simplifies the MatMul dispatch 😄

We spotted a couple of issues with the code, I've added comments. I hope they're helpful.

CMakeLists.txt

ggml-sycl.cpp

CMakeLists.txt

ggml-sycl.cpp

joeatodd · 2024-05-27T13:51:31Z

ggml-sycl.cpp

+    use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
+    use_mul_mat_q     = use_mul_mat_q    ; // check dp4a


These lines don't do anything.

there will be refactoring about SYCL computation capablities in #6408 keep here for reminder

joeatodd · 2024-05-27T14:02:46Z

ggml-sycl.cpp

+    use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
+    use_mul_mat_q     = use_mul_mat_q    ; // check dp4a
+#ifdef SYCL_USE_XMX
+    use_mul_mat_q     = use_mul_mat_q     && (!fp16_performance_good || src1->ne[1] <= MMQ_MAX_BATCH_SIZE);


The logic in this block is a little confusing:
(!fp16_performance_good || src1->ne[1] <= MMQ_MAX_BATCH_SIZE).

It says: use MMQ if either 1) FP16 perf is bad, or 2) the number of columns in src1 is less than or equal to the maximum.

Is this the intention?

just align with CUDA logic

AidanBeltonS

I am seeing a new test failures on Arc A770, is this to be expected?

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): GGML_ASSERT: /builds/perseus-performance-libraries/llama_ci/llama.cpp/ggml-sycl.cpp:13858: false
ggml_sycl_op_dequantize_mul_mat_vec unsupported GGML_TYPE 16

AidanBeltonS · 2024-05-27T14:16:25Z

ggml-sycl.cpp

+bool ggml_sycl_supports_mmq(enum ggml_type type) {
+    // TODO: accuracy issues in MMQ
+    return false;


Could you elaborate on what accuracy issues you are having with MMQ?

the master using ggml_sycl_op_mul_mat_sycl for these 5 cases, you can try to force using MMQ

MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1]): ggml_sycl_op_mul_mat_sycl

airMeng · 2024-05-28T06:07:48Z

I am seeing a new test failures on Arc A770, is this to be expected?

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): GGML_ASSERT: /builds/perseus-performance-libraries/llama_ci/llama.cpp/ggml-sycl.cpp:13858: false
ggml_sycl_op_dequantize_mul_mat_vec unsupported GGML_TYPE 16

fixed

ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <[email protected]>

NeoZhangJianyu

The performance is increased from 34 token/s to 37 on Arc770 with llama2-7b-Q4.
It's good!

airMeng requested review from ggerganov, abhilash1910 and arthw May 27, 2024 13:14

joeatodd reviewed May 27, 2024

View reviewed changes

github-actions bot added build Compilation issues SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 27, 2024

AidanBeltonS reviewed May 27, 2024

View reviewed changes

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 28, 2024

airMeng added 5 commits May 28, 2024 13:29

align GEMM dispatch

583c81c

fix typo

abe594a

remove useless use_xmx

19dc47c

update readme

bfed283

typo

4bf6133

airMeng force-pushed the sycl-gemm-dispatch branch from bcadd61 to bfed283 Compare May 28, 2024 06:07

airMeng requested a review from NeoZhangJianyu May 28, 2024 07:30

airMeng added 3 commits May 28, 2024 16:03

revert FORCE_DMMV both in cuda and sycl

c7ed1d8

revert typo

8eb0549

remove fp16 replacing fp32

d0e9e0e

NeoZhangJianyu reviewed May 28, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml-sycl.cpp Outdated Show resolved Hide resolved

airMeng and others added 5 commits May 28, 2024 20:59

Update ggml-sycl.cpp

6a8432b

Co-authored-by: Neo Zhang Jianyu <[email protected]>

Update ggml-sycl.cpp

1723c14

Co-authored-by: Neo Zhang Jianyu <[email protected]>

Update ggml-sycl.cpp

fc08e1a

Co-authored-by: Neo Zhang Jianyu <[email protected]>

Update ggml-sycl.cpp

d63f6b6

Co-authored-by: Neo Zhang Jianyu <[email protected]>

rm unused

732c3c9

NeoZhangJianyu approved these changes May 28, 2024

View reviewed changes

airMeng merged commit b864b50 into master May 28, 2024
67 checks passed

airMeng deleted the sycl-gemm-dispatch branch May 30, 2024 01:56

airMeng mentioned this pull request Jun 3, 2024

[SYCL] remove global variables #7710

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Align GEMM dispatch #7566

[SYCL] Align GEMM dispatch #7566

airMeng commented May 27, 2024 •

edited

Loading

joeatodd left a comment

joeatodd May 27, 2024

airMeng May 27, 2024

joeatodd May 27, 2024

airMeng May 27, 2024

AidanBeltonS left a comment •

edited

Loading

AidanBeltonS May 27, 2024

airMeng May 27, 2024

airMeng commented May 28, 2024

NeoZhangJianyu left a comment

		use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
		use_mul_mat_q = use_mul_mat_q ; // check dp4a

[SYCL] Align GEMM dispatch #7566

[SYCL] Align GEMM dispatch #7566

Conversation

airMeng commented May 27, 2024 • edited Loading

joeatodd left a comment

Choose a reason for hiding this comment

joeatodd May 27, 2024

Choose a reason for hiding this comment

airMeng May 27, 2024

Choose a reason for hiding this comment

joeatodd May 27, 2024

Choose a reason for hiding this comment

airMeng May 27, 2024

Choose a reason for hiding this comment

AidanBeltonS left a comment • edited Loading

Choose a reason for hiding this comment

AidanBeltonS May 27, 2024

Choose a reason for hiding this comment

airMeng May 27, 2024

Choose a reason for hiding this comment

airMeng commented May 28, 2024

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

airMeng commented May 27, 2024 •

edited

Loading

AidanBeltonS left a comment •

edited

Loading