Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Float16 sincos intrinsic #533

Merged
merged 1 commit into from
Feb 5, 2025
Merged

Fix Float16 sincos intrinsic #533

merged 1 commit into from
Feb 5, 2025

Conversation

christiangnrd
Copy link
Contributor

@christiangnrd christiangnrd commented Feb 4, 2025

Also same treatment to the Float32 intrinsics since I don't know why it used to work.

Closes #530

Copy link
Contributor

github-actions bot commented Feb 4, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/device/intrinsics.jl b/test/device/intrinsics.jl
index b71f62b1..4c88fb67 100644
--- a/test/device/intrinsics.jl
+++ b/test/device/intrinsics.jl
@@ -257,9 +257,9 @@ end
             return nothing
         end
 
-        Metal.@sync @metal threads = N intr_test3(bufferA, bufferB)
-        @test Array(bufferA) ≈ sin.(arr)
-        @test Array(bufferB) ≈ cos.(arr)
+            Metal.@sync @metal threads = N intr_test3(bufferA, bufferB)
+            @test Array(bufferA) ≈ sin.(arr)
+            @test Array(bufferB) ≈ cos.(arr)
     end
 
     let # clamp

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 1981c5f Previous: 924a130 Ratio
private array/construct 24989.583333333336 ns 25257 ns 0.99
private array/broadcast 463084 ns 463958 ns 1.00
private array/random/randn/Float32 814208.5 ns 760208 ns 1.07
private array/random/randn!/Float32 618500 ns 624687.5 ns 0.99
private array/random/rand!/Int64 558250 ns 563395.5 ns 0.99
private array/random/rand!/Float32 585541 ns 587854 ns 1.00
private array/random/rand/Int64 785292 ns 755083 ns 1.04
private array/random/rand/Float32 628667 ns 611333 ns 1.03
private array/copyto!/gpu_to_gpu 644250 ns 650916.5 ns 0.99
private array/copyto!/cpu_to_gpu 612062 ns 820417 ns 0.75
private array/copyto!/gpu_to_cpu 813583.5 ns 686687.5 ns 1.18
private array/accumulate/1d 1355104 ns 1362854.5 ns 0.99
private array/accumulate/2d 1391229 ns 1404375 ns 0.99
private array/iteration/findall/int 2101854.5 ns 2111541.5 ns 1.00
private array/iteration/findall/bool 1808708 ns 1849958 ns 0.98
private array/iteration/findfirst/int 1698167 ns 1715916 ns 0.99
private array/iteration/findfirst/bool 1664708 ns 1673542 ns 0.99
private array/iteration/scalar 3897542 ns 3935958 ns 0.99
private array/iteration/logical 3179583.5 ns 3227791 ns 0.99
private array/iteration/findmin/1d 1755396 ns 1778916.5 ns 0.99
private array/iteration/findmin/2d 1347875 ns 1352041 ns 1.00
private array/reductions/reduce/1d 1037291.5 ns 1043458 ns 0.99
private array/reductions/reduce/2d 663541 ns 669709 ns 0.99
private array/reductions/mapreduce/1d 1036208 ns 1052375 ns 0.98
private array/reductions/mapreduce/2d 663292 ns 675854.5 ns 0.98
private array/permutedims/4d 2510521 ns 2557249.5 ns 0.98
private array/permutedims/2d 1026000 ns 1026042 ns 1.00
private array/permutedims/3d 1577041.5 ns 1594916 ns 0.99
private array/copy 563875 ns 569146 ns 0.99
latency/precompile 8863922500 ns 8851374292 ns 1.00
latency/ttfp 3613381417 ns 3620472000 ns 1.00
latency/import 1231948875 ns 1235528208 ns 1.00
integration/metaldevrt 717292 ns 717959 ns 1.00
integration/byval/slices=1 1643125 ns 1539333 ns 1.07
integration/byval/slices=3 10184375 ns 9117125 ns 1.12
integration/byval/reference 1535499.5 ns 1560292 ns 0.98
integration/byval/slices=2 2646437 ns 2731458.5 ns 0.97
kernel/indexing 483792 ns 457750 ns 1.06
kernel/indexing_checked 474125 ns 455833 ns 1.04
kernel/launch 8125 ns 8166 ns 0.99
metal/synchronization/stream 14666.5 ns 14708 ns 1.00
metal/synchronization/context 15166 ns 14791 ns 1.03
shared array/construct 24305.5 ns 25149.333333333332 ns 0.97
shared array/broadcast 459167 ns 464959 ns 0.99
shared array/random/randn/Float32 812125.5 ns 835792 ns 0.97
shared array/random/randn!/Float32 617583 ns 635500 ns 0.97
shared array/random/rand!/Int64 559458 ns 560542 ns 1.00
shared array/random/rand!/Float32 580709 ns 598750 ns 0.97
shared array/random/rand/Int64 761250 ns 765583 ns 0.99
shared array/random/rand/Float32 635708 ns 604979 ns 1.05
shared array/copyto!/gpu_to_gpu 84458 ns 79167 ns 1.07
shared array/copyto!/cpu_to_gpu 84042 ns 82250 ns 1.02
shared array/copyto!/gpu_to_cpu 83875 ns 83000 ns 1.01
shared array/accumulate/1d 1344584 ns 1366937.5 ns 0.98
shared array/accumulate/2d 1391083.5 ns 1400458 ns 0.99
shared array/iteration/findall/int 1781042 ns 1854249.5 ns 0.96
shared array/iteration/findall/bool 1584417 ns 1619625 ns 0.98
shared array/iteration/findfirst/int 1393417 ns 1384771 ns 1.01
shared array/iteration/findfirst/bool 1365979 ns 1375770.5 ns 0.99
shared array/iteration/scalar 157583 ns 155000 ns 1.02
shared array/iteration/logical 2972542 ns 3012021 ns 0.99
shared array/iteration/findmin/1d 1471542 ns 1470417 ns 1.00
shared array/iteration/findmin/2d 1372791 ns 1355875 ns 1.01
shared array/reductions/reduce/1d 726000 ns 729437.5 ns 1.00
shared array/reductions/reduce/2d 668083 ns 674104.5 ns 0.99
shared array/reductions/mapreduce/1d 739584 ns 743833 ns 0.99
shared array/reductions/mapreduce/2d 685542 ns 658375 ns 1.04
shared array/permutedims/4d 2523750 ns 2575708 ns 0.98
shared array/permutedims/2d 1019708.5 ns 1011875 ns 1.01
shared array/permutedims/3d 1579416 ns 1573854 ns 1.00
shared array/copy 244334 ns 246417 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit e3d369e into main Feb 5, 2025
7 checks passed
@maleadt maleadt deleted the sincos branch February 5, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sincos intrinsic fails to compile with Float16
2 participants