We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I see the code snippet here:
if (warp_group_idx == 0) { // do computation.. __syncthreads(); cutlass::arch::NamedBarrier::arrive(kNThreads, static_cast<int>(NamedBarriers::SReady)); }
I print this value: kNThreads, it's 256 here.
And the cuda block size is also 256. It's divided into 2 warp groups, 128 threads for each group.
My question is:
Is it ok to wait for 256 threads in a path where only 128 threads will go through?
Doesn't it cause the cuda block hang?
Also, the same question for the usage of __syncthreads in a branch where only half threads goes to.
__syncthreads
The text was updated successfully, but these errors were encountered:
The other warp group also goes through a path with the same barrier.
Sorry, something went wrong.
So the same goes for the __syncthreads function. It's a totally new understanding for me. Thank you!
No branches or pull requests
I see the code snippet here:
I print this value: kNThreads, it's 256 here.
And the cuda block size is also 256. It's divided into 2 warp groups, 128 threads for each group.
My question is:
Is it ok to wait for 256 threads in a path where only 128 threads will go through?
Doesn't it cause the cuda block hang?
Also, the same question for the usage of
__syncthreads
in a branch where only half threads goes to.The text was updated successfully, but these errors were encountered: