Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Align GEMM dispatch #7566

Merged
merged 13 commits into from
May 28, 2024
Merged

[SYCL] Align GEMM dispatch #7566

merged 13 commits into from
May 28, 2024

Conversation

airMeng
Copy link
Collaborator

@airMeng airMeng commented May 27, 2024

The issue is exposed during #6408

I will split #6408 into several smaller PRs for eazy reviewing, the tasks will be updated according to issues exposed

  • GEMM aligning with CUDA backends
  • remove all global variables and pass the context instead
  • leave the async copy to common backend
  • separate GEMM files outside of ggml-sycl.c for maintainability.

Copy link
Contributor

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a welcome addition since it simplifies the MatMul dispatch 😄

We spotted a couple of issues with the code, I've added comments. I hope they're helpful.

CMakeLists.txt Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated
Comment on lines 15237 to 15238
use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
use_mul_mat_q = use_mul_mat_q ; // check dp4a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines don't do anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be refactoring about SYCL computation capablities in #6408 keep here for reminder

ggml-sycl.cpp Outdated
use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
use_mul_mat_q = use_mul_mat_q ; // check dp4a
#ifdef SYCL_USE_XMX
use_mul_mat_q = use_mul_mat_q && (!fp16_performance_good || src1->ne[1] <= MMQ_MAX_BATCH_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this block is a little confusing:
(!fp16_performance_good || src1->ne[1] <= MMQ_MAX_BATCH_SIZE).

It says: use MMQ if either 1) FP16 perf is bad, or 2) the number of columns in src1 is less than or equal to the maximum.

Is this the intention?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just align with CUDA logic

@github-actions github-actions bot added build Compilation issues SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 27, 2024
Copy link
Contributor

@AidanBeltonS AidanBeltonS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing a new test failures on Arc A770, is this to be expected?

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): GGML_ASSERT: /builds/perseus-performance-libraries/llama_ci/llama.cpp/ggml-sycl.cpp:13858: false
ggml_sycl_op_dequantize_mul_mat_vec unsupported GGML_TYPE 16

ggml-sycl.cpp Outdated
Comment on lines 15176 to 15197
bool ggml_sycl_supports_mmq(enum ggml_type type) {
// TODO: accuracy issues in MMQ
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what accuracy issues you are having with MMQ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the master using ggml_sycl_op_mul_mat_sycl for these 5 cases, you can try to force using MMQ

  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1]): ggml_sycl_op_mul_mat_sycl
OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1]): ggml_sycl_op_mul_mat_sycl
OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1]): ggml_sycl_op_mul_mat_sycl
OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1]): ggml_sycl_op_mul_mat_sycl
OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1]): ggml_sycl_op_mul_mat_sycl

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 28, 2024
@airMeng airMeng force-pushed the sycl-gemm-dispatch branch from bcadd61 to bfed283 Compare May 28, 2024 06:07
@airMeng
Copy link
Collaborator Author

airMeng commented May 28, 2024

I am seeing a new test failures on Arc A770, is this to be expected?

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): GGML_ASSERT: /builds/perseus-performance-libraries/llama_ci/llama.cpp/ggml-sycl.cpp:13858: false
ggml_sycl_op_dequantize_mul_mat_vec unsupported GGML_TYPE 16

fixed

@airMeng airMeng requested a review from NeoZhangJianyu May 28, 2024 07:30
ggml-sycl.cpp Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated Show resolved Hide resolved
ggml-sycl.cpp Outdated Show resolved Hide resolved
airMeng and others added 5 commits May 28, 2024 20:59
Co-authored-by: Neo Zhang Jianyu <[email protected]>
Co-authored-by: Neo Zhang Jianyu <[email protected]>
Co-authored-by: Neo Zhang Jianyu <[email protected]>
Co-authored-by: Neo Zhang Jianyu <[email protected]>
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance is increased from 34 token/s to 37 on Arc770 with llama2-7b-Q4.
It's good!

@airMeng airMeng merged commit b864b50 into master May 28, 2024
67 checks passed
@airMeng airMeng deleted the sycl-gemm-dispatch branch May 30, 2024 01:56
@airMeng airMeng mentioned this pull request Jun 3, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants