Question about kernel synchronization in a warp group #41

sleepwalker2017 · 2025-02-25T09:15:22Z

I see the code snippet here:

    if (warp_group_idx == 0) {
           // do computation..
           __syncthreads();
           cutlass::arch::NamedBarrier::arrive(kNThreads, static_cast<int>(NamedBarriers::SReady));
    }

I print this value: kNThreads, it's 256 here.

And the cuda block size is also 256. It's divided into 2 warp groups, 128 threads for each group.

My question is:

Is it ok to wait for 256 threads in a path where only 128 threads will go through?

Doesn't it cause the cuda block hang?

Also, the same question for the usage of __syncthreads in a branch where only half threads goes to.

The text was updated successfully, but these errors were encountered:

beginlner · 2025-02-25T15:24:40Z

The other warp group also goes through a path with the same barrier.

sleepwalker2017 · 2025-02-25T15:37:25Z

The other warp group also goes through a path with the same barrier.

So the same goes for the __syncthreads function. It's a totally new understanding for me. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about kernel synchronization in a warp group #41

Question about kernel synchronization in a warp group #41

sleepwalker2017 commented Feb 25, 2025 •

edited

Loading

beginlner commented Feb 25, 2025

sleepwalker2017 commented Feb 25, 2025

Question about kernel synchronization in a warp group #41

Question about kernel synchronization in a warp group #41

Comments

sleepwalker2017 commented Feb 25, 2025 • edited Loading

beginlner commented Feb 25, 2025

sleepwalker2017 commented Feb 25, 2025

sleepwalker2017 commented Feb 25, 2025 •

edited

Loading