gemm split-k implementation #696

xiaohuguo2023 · 2025-01-07T22:40:05Z

add autotuning for split-k
add reduction kernel for split-k
use torch.sum to replace split-k reduction kernel

vgokhale · 2025-01-07T23:00:57Z

python/perf-kernels/gemm.py

@@ -145,11 +194,17 @@ def matmul(a, b, c, a_scale, b_scale, scale_a8_b8=False, activation=""):
    assert a.dtype == b.dtype, "Mixed dtype GEMMs are not supported!!!"
    M, K = a.shape
    K, N = b.shape
-    grid = lambda META: (triton.cdiv(M, META['BLOCK_SIZE_M']) * triton.cdiv(N, META['BLOCK_SIZE_N']), )
+    splitk = 1
+    c_buf = torch.zeros((M, N, splitk), device=a.device, dtype=torch.float32)


torch.empty

xiaohuguo2023 added 2 commits December 23, 2024 17:26

add splitk support in gemm kernel

aae9120

add two kernel of splik

98bbbc0

xiaohuguo2023 requested a review from vgokhale January 7, 2025 22:40

vgokhale reviewed Jan 7, 2025

View reviewed changes

xiaohuguo2023 added 2 commits January 8, 2025 03:45

use torch.empty

955e7e3

format changes

e8b4397

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemm split-k implementation #696

gemm split-k implementation #696

xiaohuguo2023 commented Jan 7, 2025

vgokhale Jan 7, 2025

gemm split-k implementation #696

Are you sure you want to change the base?

gemm split-k implementation #696

Conversation

xiaohuguo2023 commented Jan 7, 2025

vgokhale Jan 7, 2025

Choose a reason for hiding this comment