[QST] Should this line be divided by Layout::kFactor? #502

peisun1115 · 2022-05-26T20:51:18Z

Should this line be divided by Layout::kFactor ? Or it should just use ref.stride(0) (ref from ctor).

stride_ in this class is multiplied by kFactor in line 526 but add_tile_offset is supposed to add an offset based on threadblock tile level

Very likely I am wrong :) but could it be because this code is rarely used as no code needs to set this cord.stride() > 0 ?

hwu36 · 2022-05-27T01:17:32Z

I actually thinks you are mostly correct though I am not 100% sure without a unit test to confirm it. For the Crosswise layout, when we move along K dimension to compute GEMM, we move along the contiguous dimension. So coord.stride() is always 0.

Instead of divide kFactor, I think we need to divide sections_. If you are interested, you can make the change and try all the unit tests first.

peisun1115 · 2022-05-27T01:55:43Z

Thank you for your response! Why should it be sections_?

Suppose the tensor shape is: pitch linear (128, 64). (c 128, s 64). Threadblock tile is (32, 32). Element = half_t

If i understand correctly,
ref.stride(0) = 128,
Crosswise = 32,
kFactor = 2,
sections_ = 4,
sections_per_stage_ = 1
Shape::kStrided = 32
kElementsPerAccess = 128 / 16 = 8
stride_ = 128 * 2 / 8 = 32

one strided dim increment should add
128 * 32 elements?

which is the same as 1 * 32 * 32 *8 = 128 * 32 * 2

hwu36 · 2022-05-27T04:05:26Z

Think it again, now I think you are correct. In the unit of elements,

coord.strided() * Crosswise * kFactor * Shape::kStrided / kFactor * sections_ = coord.strided() * Crosswise * Shape::kStrided * sections_ = coord.strided() * stride_ / kFactor * Shape::kStrided

peisun1115 · 2022-05-27T05:33:09Z

Thank you!

I created a pull request but i am not able test it locally somehow (looks like related to this: NVlabs/instant-ngp#119)

it failed to compile. maybe my local gcc version is not compatible? maybe you can just fix it. thanks!

~/git/cutlass/build/examples/12_gemm_bias_relu$ make
Building CUDA object examples/12_gemm_bias_relu/CMakeFiles/12_gemm_bias_relu.dir/gemm_bias_relu.cu.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
make[2]: *** [examples/12_gemm_bias_relu/CMakeFiles/12_gemm_bias_relu.dir/build.make:76: examples/12_gemm_bias_relu/CMakeFiles/12_gemm_bias_relu.dir/gemm_bias_relu.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3593: examples/12_gemm_bias_relu/CMakeFiles/12_gemm_bias_relu.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

mnicely · 2022-06-18T16:32:29Z

@peisun1115 did you resolve your issue?

peisun1115 · 2022-06-18T19:12:08Z

yes, it is resolved

when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True. So we don't need to wait for handle. Just skip.

peisun1115 added ? - Needs Triage question Question labels May 26, 2022

mnicely removed the ? - Needs Triage label Jun 11, 2022

peisun1115 closed this as completed Jun 18, 2022

jgli pushed a commit to jgli/cutlass that referenced this issue Nov 14, 2024

[bugfix] handle_x not define when using checkpoint_lvl = 2 (NVIDIA#502)

0cb595a

when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True. So we don't need to wait for handle. Just skip.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Should this line be divided by Layout::kFactor? #502

[QST] Should this line be divided by Layout::kFactor? #502

peisun1115 commented May 26, 2022 •

edited

Loading

hwu36 commented May 27, 2022

peisun1115 commented May 27, 2022

hwu36 commented May 27, 2022 •

edited

Loading

peisun1115 commented May 27, 2022 •

edited

Loading

mnicely commented Jun 18, 2022

peisun1115 commented Jun 18, 2022

[QST] Should this line be divided by Layout::kFactor? #502

[QST] Should this line be divided by Layout::kFactor? #502

Comments

peisun1115 commented May 26, 2022 • edited Loading

hwu36 commented May 27, 2022

peisun1115 commented May 27, 2022

hwu36 commented May 27, 2022 • edited Loading

peisun1115 commented May 27, 2022 • edited Loading

mnicely commented Jun 18, 2022

peisun1115 commented Jun 18, 2022

peisun1115 commented May 26, 2022 •

edited

Loading

hwu36 commented May 27, 2022 •

edited

Loading

peisun1115 commented May 27, 2022 •

edited

Loading