Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CL path for MaxUnpooling2d #1414

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Support CL path for MaxUnpooling2d #1414

wants to merge 2 commits into from

Conversation

chunhuanMeng
Copy link
Contributor

@chunhuanMeng chunhuanMeng commented Feb 27, 2025

Previously, due to the lack of CL path, a lot of time was spent on format conversion, converting CF to CL, so adding CL path to optimize op performance.

@chunhuanMeng
Copy link
Contributor Author

forward

dtype shape op before kernel before op with pr kernel with pr
bf16 ['bfloat16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 54.688us 43.716us 43.792us 32.212us
bf16 ['bfloat16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 55.800us 44.456us 45.064us 33.080us
bf16 ['bfloat16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 145.432us 126.584us 108.292us 88.904us
bf16_ChannelsLast ['bfloat16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 69.856us 42.656us 41.460us 35.832us
bf16_ChannelsLast ['bfloat16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 72.764us 34.700us 41.564us 35.868us
bf16_ChannelsLast ['bfloat16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 226.276us 118.804us 137.692us 119.124us
fp16 ['float16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 55.468us 44.088us 43.288us 31.768us
fp16 ['float16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 52.940us 41.944us 44.120us 32.260us
fp16 ['float16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 146.588us 127.560us 108.132us 89.688us
fp16_ChannelsLast ['float16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 69.004us 33.612us 43.384us 37.904us
fp16_ChannelsLast ['float16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 73.116us 34.976us 43.904us 38.424us
fp16_ChannelsLast ['float16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 228.096us 118.688us 131.080us 118.856us
fp32 ['float32[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 59.688us 45.960us 45.364us 31.804us
fp32 ['float32[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 60.064us 46.200us 45.360us 31.884us
fp32 ['float32[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 162.416us 130.208us 126.508us 98.188us
fp32_ChannelsLast ['float32[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 81.880us 34.816us 48.620us 34.620us
fp32_ChannelsLast ['float32[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 89.652us 36.656us 41.604us 35.452us
fp32_ChannelsLast ['float32[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 268.680us 130.152us 157.360us 120.520us

backward

dtype shape kernel kernel
bf16_backward ['bfloat16[4, 64, 128, 128]', 'bfloat16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 24.404us 21.932us
bf16_backward ['bfloat16[4, 65, 128, 128]', 'bfloat16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 23.796us 21.712us
bf16_backward ['bfloat16[8, 128, 128, 128]', 'bfloat16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 51.832us 49.336us
bf16_backward_ChannelsLast ['bfloat16[4, 64, 128, 128]', 'bfloat16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 42.656us 44.336us
bf16_backward_ChannelsLast ['bfloat16[4, 65, 128, 128]', 'bfloat16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 42.780us 42.956us
bf16_backward_ChannelsLast ['bfloat16[8, 128, 128, 128]', 'bfloat16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 137.560us 137.064us
fp16 ['float16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 23.580us 21.600us
fp16 ['float16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 23.008us 21.764us
fp16 ['float16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 52.324us 49.952us
fp16_ChannelsLast ['float16[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 42.424us 42.724us
fp16_ChannelsLast ['float16[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 42.852us 43.008us
fp16_ChannelsLast ['float16[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 137.312us 137.548us
fp32_backward ['float32[4, 64, 128, 128]', 'float32[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 21.704us 22.756us
fp32_backward ['float32[4, 65, 128, 128]', 'float32[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 21.752us 22.828us
fp32_backward ['float32[8, 128, 128, 128]', 'float32[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 61.292us 50.636us
fp32_backward_ChannelsLast ['float32[4, 64, 128, 128]', 'float32[4, 64, 2, 2]', 'int64[4, 64, 2, 2]', [128, 128]] 41.496us 41.684us
fp32_backward_ChannelsLast ['float32[4, 65, 128, 128]', 'float32[4, 65, 1, 1]', 'int64[4, 65, 1, 1]', [128, 128]] 42.996us 41.604us
fp32_backward_ChannelsLast ['float32[8, 128, 128, 128]', 'float32[8, 128, 1, 1]', 'int64[8, 128, 1, 1]', [128, 128]] 147.556us 138.736us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant