Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple devices per backend #554

Merged
merged 7 commits into from
Jan 21, 2025
Merged

support multiple devices per backend #554

merged 7 commits into from
Jan 21, 2025

Conversation

vchuravy
Copy link
Member

Fixes #458

Minimal API proposal. Should we have a query for if the backends implement this?

Copy link
Contributor

github-actions bot commented Jan 13, 2025

Benchmark Results

main 211a658... main/211a658eb6ce3d...
saxpy/default/Float16/1024 0.736 ± 0.0073 μs 0.731 ± 0.0069 μs 1.01
saxpy/default/Float16/1048576 0.174 ± 0.0078 ms 0.181 ± 0.014 ms 0.964
saxpy/default/Float16/16384 3.34 ± 0.025 μs 3.33 ± 0.046 μs 1
saxpy/default/Float16/2048 0.912 ± 0.012 μs 0.905 ± 0.011 μs 1.01
saxpy/default/Float16/256 0.588 ± 0.0075 μs 0.58 ± 0.0076 μs 1.01
saxpy/default/Float16/262144 0.0441 ± 0.00058 ms 0.0445 ± 0.0012 ms 0.991
saxpy/default/Float16/32768 6.02 ± 0.05 μs 6.02 ± 0.092 μs 1
saxpy/default/Float16/4096 1.31 ± 0.028 μs 1.29 ± 0.026 μs 1.01
saxpy/default/Float16/512 0.649 ± 0.0076 μs 0.641 ± 0.007 μs 1.01
saxpy/default/Float16/64 0.553 ± 0.005 μs 0.546 ± 0.0057 μs 1.01
saxpy/default/Float16/65536 11.7 ± 0.11 μs 11.6 ± 0.18 μs 1
saxpy/default/Float32/1024 0.652 ± 0.0097 μs 0.639 ± 0.0098 μs 1.02
saxpy/default/Float32/1048576 0.231 ± 0.019 ms 0.245 ± 0.034 ms 0.943
saxpy/default/Float32/16384 2.75 ± 0.16 μs 2.79 ± 0.25 μs 0.984
saxpy/default/Float32/2048 0.757 ± 0.064 μs 0.751 ± 0.015 μs 1.01
saxpy/default/Float32/256 0.578 ± 0.0063 μs 0.57 ± 0.0054 μs 1.01
saxpy/default/Float32/262144 0.0557 ± 0.0086 ms 0.0462 ± 0.0048 ms 1.2
saxpy/default/Float32/32768 5.24 ± 0.33 μs 5.38 ± 1.3 μs 0.973
saxpy/default/Float32/4096 1.13 ± 0.082 μs 1.11 ± 0.075 μs 1.01
saxpy/default/Float32/512 0.618 ± 0.0072 μs 0.609 ± 0.007 μs 1.01
saxpy/default/Float32/64 0.565 ± 0.0055 μs 0.561 ± 0.0052 μs 1.01
saxpy/default/Float32/65536 12.9 ± 1.4 μs 12.6 ± 1.7 μs 1.03
saxpy/default/Float64/1024 0.757 ± 0.064 μs 0.745 ± 0.022 μs 1.02
saxpy/default/Float64/1048576 0.503 ± 0.04 ms 0.543 ± 0.057 ms 0.926
saxpy/default/Float64/16384 5.25 ± 0.33 μs 5.48 ± 1.4 μs 0.958
saxpy/default/Float64/2048 1.13 ± 0.081 μs 1.12 ± 0.074 μs 1.01
saxpy/default/Float64/256 0.579 ± 0.0063 μs 0.577 ± 0.0067 μs 1
saxpy/default/Float64/262144 0.115 ± 0.01 ms 0.116 ± 0.012 ms 0.991
saxpy/default/Float64/32768 12.6 ± 0.86 μs 12.9 ± 1.4 μs 0.976
saxpy/default/Float64/4096 1.68 ± 0.1 μs 1.74 ± 0.28 μs 0.967
saxpy/default/Float64/512 0.631 ± 0.0094 μs 0.629 ± 0.011 μs 1
saxpy/default/Float64/64 0.557 ± 0.0055 μs 0.557 ± 0.0069 μs 1
saxpy/default/Float64/65536 28.9 ± 2.4 μs 28.9 ± 2.1 μs 1
saxpy/static workgroup=(1024,)/Float16/1024 2.18 ± 0.029 μs 2.19 ± 0.03 μs 0.992
saxpy/static workgroup=(1024,)/Float16/1048576 0.165 ± 0.011 ms 0.174 ± 0.017 ms 0.951
saxpy/static workgroup=(1024,)/Float16/16384 4.41 ± 0.052 μs 4.41 ± 0.067 μs 0.998
saxpy/static workgroup=(1024,)/Float16/2048 2.35 ± 0.032 μs 2.36 ± 0.033 μs 0.996
saxpy/static workgroup=(1024,)/Float16/256 2.81 ± 0.039 μs 2.82 ± 0.045 μs 0.994
saxpy/static workgroup=(1024,)/Float16/262144 0.0441 ± 0.0034 ms 0.0435 ± 0.0023 ms 1.01
saxpy/static workgroup=(1024,)/Float16/32768 6.83 ± 0.14 μs 6.82 ± 0.15 μs 1
saxpy/static workgroup=(1024,)/Float16/4096 2.68 ± 0.041 μs 2.68 ± 0.042 μs 0.999
saxpy/static workgroup=(1024,)/Float16/512 3.25 ± 0.041 μs 3.27 ± 0.041 μs 0.994
saxpy/static workgroup=(1024,)/Float16/64 2.51 ± 0.22 μs 2.52 ± 0.21 μs 0.996
saxpy/static workgroup=(1024,)/Float16/65536 12.4 ± 0.28 μs 12.5 ± 0.33 μs 0.995
saxpy/static workgroup=(1024,)/Float32/1024 2.21 ± 0.041 μs 2.21 ± 0.041 μs 1
saxpy/static workgroup=(1024,)/Float32/1048576 0.239 ± 0.024 ms 0.226 ± 0.03 ms 1.06
saxpy/static workgroup=(1024,)/Float32/16384 4.34 ± 0.2 μs 4.31 ± 0.19 μs 1.01
saxpy/static workgroup=(1024,)/Float32/2048 2.37 ± 0.052 μs 2.37 ± 0.05 μs 0.999
saxpy/static workgroup=(1024,)/Float32/256 2.69 ± 0.062 μs 2.69 ± 0.048 μs 1
saxpy/static workgroup=(1024,)/Float32/262144 0.0599 ± 0.0047 ms 0.0593 ± 0.0041 ms 1.01
saxpy/static workgroup=(1024,)/Float32/32768 7.27 ± 0.34 μs 7.31 ± 0.38 μs 0.995
saxpy/static workgroup=(1024,)/Float32/4096 2.67 ± 0.077 μs 2.64 ± 0.063 μs 1.01
saxpy/static workgroup=(1024,)/Float32/512 2.72 ± 0.081 μs 2.73 ± 0.095 μs 0.997
saxpy/static workgroup=(1024,)/Float32/64 2.72 ± 5.4 μs 2.71 ± 5.5 μs 1
saxpy/static workgroup=(1024,)/Float32/65536 17 ± 1.6 μs 16 ± 1.7 μs 1.06
saxpy/static workgroup=(1024,)/Float64/1024 2.34 ± 0.073 μs 2.37 ± 0.077 μs 0.986
saxpy/static workgroup=(1024,)/Float64/1048576 0.547 ± 0.075 ms 0.582 ± 0.07 ms 0.94
saxpy/static workgroup=(1024,)/Float64/16384 7.34 ± 0.35 μs 7.31 ± 0.42 μs 1
saxpy/static workgroup=(1024,)/Float64/2048 2.62 ± 0.082 μs 2.64 ± 0.072 μs 0.995
saxpy/static workgroup=(1024,)/Float64/256 2.71 ± 0.081 μs 2.72 ± 0.082 μs 0.997
saxpy/static workgroup=(1024,)/Float64/262144 0.11 ± 0.017 ms 0.119 ± 0.012 ms 0.929
saxpy/static workgroup=(1024,)/Float64/32768 16.2 ± 1.7 μs 15.8 ± 1.6 μs 1.02
saxpy/static workgroup=(1024,)/Float64/4096 3.14 ± 0.093 μs 3.16 ± 0.12 μs 0.992
saxpy/static workgroup=(1024,)/Float64/512 2.72 ± 0.088 μs 2.71 ± 0.077 μs 1.01
saxpy/static workgroup=(1024,)/Float64/64 2.65 ± 0.078 μs 2.65 ± 0.063 μs 1
saxpy/static workgroup=(1024,)/Float64/65536 31.4 ± 2.4 μs 0.0319 ± 0.0028 ms 0.983
time_to_load 0.328 ± 0.00099 s 0.33 ± 0.0033 s 0.994

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy marked this pull request as ready for review January 21, 2025 11:09
src/KernelAbstractions.jl Outdated Show resolved Hide resolved
src/KernelAbstractions.jl Outdated Show resolved Hide resolved
test/devices.jl Outdated Show resolved Hide resolved
test/devices.jl Outdated Show resolved Hide resolved
test/devices.jl Outdated Show resolved Hide resolved
src/KernelAbstractions.jl Outdated Show resolved Hide resolved
@vchuravy vchuravy merged commit e5ef261 into main Jan 21, 2025
31 of 34 checks passed
@vchuravy vchuravy deleted the vc/multi_device branch January 21, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Feature to Select Devices to Execute Kernels On
1 participant