Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about kernel synchronization in a warp group #41

Open
sleepwalker2017 opened this issue Feb 25, 2025 · 2 comments
Open

Question about kernel synchronization in a warp group #41

sleepwalker2017 opened this issue Feb 25, 2025 · 2 comments

Comments

@sleepwalker2017
Copy link

sleepwalker2017 commented Feb 25, 2025

I see the code snippet here:

    if (warp_group_idx == 0) {
           // do computation..
           __syncthreads();
           cutlass::arch::NamedBarrier::arrive(kNThreads, static_cast<int>(NamedBarriers::SReady));
    }

I print this value: kNThreads, it's 256 here.

And the cuda block size is also 256. It's divided into 2 warp groups, 128 threads for each group.

My question is:

Is it ok to wait for 256 threads in a path where only 128 threads will go through?

Doesn't it cause the cuda block hang?

Also, the same question for the usage of __syncthreads in a branch where only half threads goes to.

@beginlner
Copy link
Collaborator

The other warp group also goes through a path with the same barrier.

@sleepwalker2017
Copy link
Author

The other warp group also goes through a path with the same barrier.

So the same goes for the __syncthreads function. It's a totally new understanding for me. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants