Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement of warpReduceMin and blockReduceMin function #31191

Conversation

JamesLim-sy
Copy link
Contributor

@JamesLim-sy JamesLim-sy commented Feb 24, 2021

PR types

Performance optimization

PR changes

OPs

Describe

A. 预置条件:

  1. 全部线程所携带的变量名一致;

B. 函数目的 :

  1. 增加函数 warpReduceMin,用于直接搜索线程数量等于warpSize情况下,线程内部同名变量最小值
  2. 增加函数 PartialWarpReduceMin,用于直接搜索线程数量小于warpSize情况下,线程内部同名变量最小值;
  3. 增加函数 blockReduceMin,用于直接搜索线程数量等于BlockDim情况下,线程内部同名变量最小值;
  4. 增加函数 PartialWarpReduceMin,用于直接搜索线程数量小于BlockDim情况下,线程内部同名变量最小值;

C. 实现原理:

  • warpReduceMin实现原理:
  1. 使用__shfl_xor_sync() 接口,用warp折半对比的方式,取得这组线程的最小值;
  • PartialWarpReduceMin实现原理:
  1. 使用__shfl_sync() 接口,将第一个传入线程携带的变量广播至整个 warp内,完成warp内的初值化操作;
  2. 将线程组携带的变量传入已初始化的warp上;
  3. 使用__shfl_xor_sync() 接口,用warp折半对比的方式,取得这组线程的最小值;

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Xreki Xreki changed the title Creating a CUDA OP to find the minimum value in warp or block Creating a CUDA function to find the minimum value in warp or block Feb 27, 2021
} else {
shared_last_idx = threadIdx.x >> 5;
shared_last_val = 1e10;
paddle::platform::CudaAtomicMin(&shared_last_val, val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max的实现里面没有用到原子操作,为什么Min的实现里面需要呢?

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit 8491ae9 into PaddlePaddle:develop Mar 5, 2021
@JamesLim-sy JamesLim-sy changed the title Creating a CUDA function to find the minimum value in warp or block Implement of warpReduceMin and blockReduceMin function Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants