GPUReduceBankConflicts does not keep track of shared memory usage #19675

nirvedhmeshram · 2025-01-10T22:27:07Z

While experimenting with GPUMMAHeuristicSeeds I ended up finding a config that was exactly maxSharedMemoryBytes. This is the dispatch for which I was able to do this.

    func.func @problem_dispatch(%13 : tensor<64x1280xf16>, %14 : tensor<40x64x1280xf16>, %15 : tensor<40x64xf32>) -> tensor<40x64x64xf16> {
        %cst = arith.constant 0.000000e+00 : f32
        %16 = tensor.empty() : tensor<40x64x64xf32>
        %17 = linalg.fill ins(%cst : f32) outs(%16 : tensor<40x64x64xf32>) -> tensor<40x64x64xf32>
        %18 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%13, %14 : tensor<64x1280xf16>, tensor<40x64x1280xf16>) outs(%17 : tensor<40x64x64xf32>) {
        ^bb0(%in: f16, %in_0: f16, %out: f32):
          %21 = arith.extf %in : f16 to f32
          %22 = arith.extf %in_0 : f16 to f32
          %23 = arith.mulf %21, %22 : f32
          %24 = arith.addf %out, %23 : f32
          linalg.yield %24 : f32
        } -> tensor<40x64x64xf32>
        %19 = tensor.empty() : tensor<40x64x64xf16>
        %20 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%18, %15 : tensor<40x64x64xf32>, tensor<40x64xf32>) outs(%19 : tensor<40x64x64xf16>) {
        ^bb0(%in: f32, %in_0: f32, %out: f16):
          %21 = arith.addf %in, %in_0 : f32
          %22 = arith.truncf %21 : f32 to f16
          linalg.yield %22 : f16
        } -> tensor<40x64x64xf16>
        return %20 : tensor<40x64x64xf16>
    }

However, I blew past the limit due to the GPUReduceBankConflicts pass. In the short term to properly prevent this we need to make GPUReduceBankConflicts aware of the usage. In the long term we should be looking at smarter ways of reducing bank conflicts.

The text was updated successfully, but these errors were encountered:

nirvedhmeshram added good first issue 🌱 Good for newcomers help wanted Extra attention is needed labels Jan 10, 2025

nirvedhmeshram mentioned this issue Jan 10, 2025

[GPU] Match TileAndFuse Matmul heuristics to VectorDistribute #19666

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUReduceBankConflicts does not keep track of shared memory usage #19675

GPUReduceBankConflicts does not keep track of shared memory usage #19675

nirvedhmeshram commented Jan 10, 2025 •

edited

Loading

GPUReduceBankConflicts does not keep track of shared memory usage #19675

GPUReduceBankConflicts does not keep track of shared memory usage #19675

Comments

nirvedhmeshram commented Jan 10, 2025 • edited Loading

nirvedhmeshram commented Jan 10, 2025 •

edited

Loading