-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Should this line be divided by Layout::kFactor? #502
Comments
I actually thinks you are mostly correct though I am not 100% sure without a unit test to confirm it. For the Instead of divide |
Thank you for your response! Why should it be sections_? Suppose the tensor shape is: pitch linear (128, 64). (c 128, s 64). Threadblock tile is (32, 32). Element = half_t If i understand correctly, one strided dim increment should add which is the same as 1 * 32 * 32 *8 = 128 * 32 * 2 |
Think it again, now I think you are correct. In the unit of elements,
|
Thank you! I created a pull request but i am not able test it locally somehow (looks like related to this: NVlabs/instant-ngp#119) it failed to compile. maybe my local gcc version is not compatible? maybe you can just fix it. thanks! ~/git/cutlass/build/examples/12_gemm_bias_relu$ make |
@peisun1115 did you resolve your issue? |
yes, it is resolved |
when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True. So we don't need to wait for handle. Just skip.
Should this line be divided by Layout::kFactor ? Or it should just use ref.stride(0) (ref from ctor).
stride_ in this class is multiplied by kFactor in line 526 but add_tile_offset is supposed to add an offset based on threadblock tile level
Very likely I am wrong :) but could it be because this code is rarely used as no code needs to set this cord.stride() > 0 ?
The text was updated successfully, but these errors were encountered: