Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Maxpooling forward/backward: Use a more appropriate workgroup size and use registers to optimize performance #1396

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

chunhuanMeng
Copy link
Contributor

Use a more appropriate workgroup size and use registers to optimize performance

@chunhuanMeng
Copy link
Contributor Author

Maxpooling3d

dtype Shape (N, C, H, W, D, oH, oW, oD, channels_last) Before With PR
bf16 16, 32, 64, 64, 64, 32, 32, 32, False 801.072us 818.840us
bf16 1, 4, 144, 144, 144, 72, 72, 72, False 91.756us 69.788us
bf16 512, 512, 12, 12, 12, 6, 6, 6, False 16.053ms 2.542ms
bf16 16, 32, 64, 64, 64, 32, 32, 32, True 2.342ms 1.093ms
bf16 1, 4, 144, 144, 144, 72, 72, 72, True 112.492us 81.068us
bf16 512, 512, 12, 12, 12, 6, 6, 6, True 33.218ms 3.645ms
fp32 16, 32, 64, 64, 64, 32, 32, 32, False 818.772us 809.988us
fp32 1, 4, 144, 144, 144, 72, 72, 72, False 90.272us 66.144us
fp32 512, 512, 12, 12, 12, 6, 6, 6, False 15.779ms 2.482ms
fp32 16, 32, 64, 64, 64, 32, 32, 32, True 2.433ms 1.088ms
fp32 1, 4, 144, 144, 144, 72, 72, 72, True 118.156us 77.212us
fp32 512, 512, 12, 12, 12, 6, 6, 6, True 32.778ms 3.644ms

Maxpooling3d_backward

dtype Shape (N, C, H, W, D, oH, oW, oD, channels_last) Before With PR
bf16 16, 32, 64, 64, 64, 32, 32, 32, False 770.588us 744.892us
bf16 1, 4, 144, 144, 144, 72, 72, 72, False 66.820us 42.584us
bf16 512, 512, 12, 12, 12, 6, 6, 6, False 13.586ms 2.366ms
bf16 16, 32, 64, 64, 64, 32, 32, 32, True 1.339ms 825.320us
bf16 1, 4, 144, 144, 144, 72, 72, 72, True 94.536us 51.188us
bf16 512, 512, 12, 12, 12, 6, 6, 6, True 23.022ms 2.722ms
fp32 16, 32, 64, 64, 64, 32, 32, 32, False 1.241ms 1.252ms
fp32 1, 4, 144, 144, 144, 72, 72, 72, False 37.060us 20.936us
fp32 512, 512, 12, 12, 12, 6, 6, 6, False 12.595ms 5.450ms
fp32 16, 32, 64, 64, 64, 32, 32, 32, True 2.860ms 1.902ms
fp32 1, 4, 144, 144, 144, 72, 72, 72, True 57.360us 20.672us
fp32 512, 512, 12, 12, 12, 6, 6, 6, True 21.412ms 6.434ms

@xytintel xytintel added this pull request to the merge queue Feb 25, 2025
Merged via the queue into main with commit 0f32aab Feb 25, 2025
8 of 16 checks passed
@xytintel xytintel deleted the meng_maxpool3d branch February 25, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants