-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Float16 sincos intrinsic #533
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/device/intrinsics.jl b/test/device/intrinsics.jl
index b71f62b1..4c88fb67 100644
--- a/test/device/intrinsics.jl
+++ b/test/device/intrinsics.jl
@@ -257,9 +257,9 @@ end
return nothing
end
- Metal.@sync @metal threads = N intr_test3(bufferA, bufferB)
- @test Array(bufferA) ≈ sin.(arr)
- @test Array(bufferB) ≈ cos.(arr)
+ Metal.@sync @metal threads = N intr_test3(bufferA, bufferB)
+ @test Array(bufferA) ≈ sin.(arr)
+ @test Array(bufferB) ≈ cos.(arr)
end
let # clamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 1981c5f | Previous: 924a130 | Ratio |
---|---|---|---|
private array/construct |
24989.583333333336 ns |
25257 ns |
0.99 |
private array/broadcast |
463084 ns |
463958 ns |
1.00 |
private array/random/randn/Float32 |
814208.5 ns |
760208 ns |
1.07 |
private array/random/randn!/Float32 |
618500 ns |
624687.5 ns |
0.99 |
private array/random/rand!/Int64 |
558250 ns |
563395.5 ns |
0.99 |
private array/random/rand!/Float32 |
585541 ns |
587854 ns |
1.00 |
private array/random/rand/Int64 |
785292 ns |
755083 ns |
1.04 |
private array/random/rand/Float32 |
628667 ns |
611333 ns |
1.03 |
private array/copyto!/gpu_to_gpu |
644250 ns |
650916.5 ns |
0.99 |
private array/copyto!/cpu_to_gpu |
612062 ns |
820417 ns |
0.75 |
private array/copyto!/gpu_to_cpu |
813583.5 ns |
686687.5 ns |
1.18 |
private array/accumulate/1d |
1355104 ns |
1362854.5 ns |
0.99 |
private array/accumulate/2d |
1391229 ns |
1404375 ns |
0.99 |
private array/iteration/findall/int |
2101854.5 ns |
2111541.5 ns |
1.00 |
private array/iteration/findall/bool |
1808708 ns |
1849958 ns |
0.98 |
private array/iteration/findfirst/int |
1698167 ns |
1715916 ns |
0.99 |
private array/iteration/findfirst/bool |
1664708 ns |
1673542 ns |
0.99 |
private array/iteration/scalar |
3897542 ns |
3935958 ns |
0.99 |
private array/iteration/logical |
3179583.5 ns |
3227791 ns |
0.99 |
private array/iteration/findmin/1d |
1755396 ns |
1778916.5 ns |
0.99 |
private array/iteration/findmin/2d |
1347875 ns |
1352041 ns |
1.00 |
private array/reductions/reduce/1d |
1037291.5 ns |
1043458 ns |
0.99 |
private array/reductions/reduce/2d |
663541 ns |
669709 ns |
0.99 |
private array/reductions/mapreduce/1d |
1036208 ns |
1052375 ns |
0.98 |
private array/reductions/mapreduce/2d |
663292 ns |
675854.5 ns |
0.98 |
private array/permutedims/4d |
2510521 ns |
2557249.5 ns |
0.98 |
private array/permutedims/2d |
1026000 ns |
1026042 ns |
1.00 |
private array/permutedims/3d |
1577041.5 ns |
1594916 ns |
0.99 |
private array/copy |
563875 ns |
569146 ns |
0.99 |
latency/precompile |
8863922500 ns |
8851374292 ns |
1.00 |
latency/ttfp |
3613381417 ns |
3620472000 ns |
1.00 |
latency/import |
1231948875 ns |
1235528208 ns |
1.00 |
integration/metaldevrt |
717292 ns |
717959 ns |
1.00 |
integration/byval/slices=1 |
1643125 ns |
1539333 ns |
1.07 |
integration/byval/slices=3 |
10184375 ns |
9117125 ns |
1.12 |
integration/byval/reference |
1535499.5 ns |
1560292 ns |
0.98 |
integration/byval/slices=2 |
2646437 ns |
2731458.5 ns |
0.97 |
kernel/indexing |
483792 ns |
457750 ns |
1.06 |
kernel/indexing_checked |
474125 ns |
455833 ns |
1.04 |
kernel/launch |
8125 ns |
8166 ns |
0.99 |
metal/synchronization/stream |
14666.5 ns |
14708 ns |
1.00 |
metal/synchronization/context |
15166 ns |
14791 ns |
1.03 |
shared array/construct |
24305.5 ns |
25149.333333333332 ns |
0.97 |
shared array/broadcast |
459167 ns |
464959 ns |
0.99 |
shared array/random/randn/Float32 |
812125.5 ns |
835792 ns |
0.97 |
shared array/random/randn!/Float32 |
617583 ns |
635500 ns |
0.97 |
shared array/random/rand!/Int64 |
559458 ns |
560542 ns |
1.00 |
shared array/random/rand!/Float32 |
580709 ns |
598750 ns |
0.97 |
shared array/random/rand/Int64 |
761250 ns |
765583 ns |
0.99 |
shared array/random/rand/Float32 |
635708 ns |
604979 ns |
1.05 |
shared array/copyto!/gpu_to_gpu |
84458 ns |
79167 ns |
1.07 |
shared array/copyto!/cpu_to_gpu |
84042 ns |
82250 ns |
1.02 |
shared array/copyto!/gpu_to_cpu |
83875 ns |
83000 ns |
1.01 |
shared array/accumulate/1d |
1344584 ns |
1366937.5 ns |
0.98 |
shared array/accumulate/2d |
1391083.5 ns |
1400458 ns |
0.99 |
shared array/iteration/findall/int |
1781042 ns |
1854249.5 ns |
0.96 |
shared array/iteration/findall/bool |
1584417 ns |
1619625 ns |
0.98 |
shared array/iteration/findfirst/int |
1393417 ns |
1384771 ns |
1.01 |
shared array/iteration/findfirst/bool |
1365979 ns |
1375770.5 ns |
0.99 |
shared array/iteration/scalar |
157583 ns |
155000 ns |
1.02 |
shared array/iteration/logical |
2972542 ns |
3012021 ns |
0.99 |
shared array/iteration/findmin/1d |
1471542 ns |
1470417 ns |
1.00 |
shared array/iteration/findmin/2d |
1372791 ns |
1355875 ns |
1.01 |
shared array/reductions/reduce/1d |
726000 ns |
729437.5 ns |
1.00 |
shared array/reductions/reduce/2d |
668083 ns |
674104.5 ns |
0.99 |
shared array/reductions/mapreduce/1d |
739584 ns |
743833 ns |
0.99 |
shared array/reductions/mapreduce/2d |
685542 ns |
658375 ns |
1.04 |
shared array/permutedims/4d |
2523750 ns |
2575708 ns |
0.98 |
shared array/permutedims/2d |
1019708.5 ns |
1011875 ns |
1.01 |
shared array/permutedims/3d |
1579416 ns |
1573854 ns |
1.00 |
shared array/copy |
244334 ns |
246417 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
maleadt
approved these changes
Feb 5, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Also same treatment to the Float32 intrinsics since I don't know why it used to work.
Closes #530