You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Sorting on GPU. Sorting is currently done on the CPU asynchronously at a lower framerate (~10 fps), which increases how often you'll see pops especially when the viewpoint changes quickly
@Yang-Xijie I have some trouble using MLX sort/argSort. So I would say don't.
And actually MetalPerformanceShaderGraph provide some ML operators like sort and argSort that we could use here. They have better performance and stability compared to MLX.
I find the following sentence in README in commit 8876629:
MetalSplatter/README.md
Line 23 in 8876629
which was removed in commit 2213483:
MetalSplatter/README.md
Line 13 in 2213483
However, it seems sorting still happens on the CPU side:
MetalSplatter/MetalSplatter/Sources/SplatRenderer.swift
Line 608 in ec6d0f2
The performance of CPU sort is N*log(N). Here are some performance report (on M2 Pro MacBook Pro, release mode, random numbers):
〇 UInt32
1000000: 0.071 s
10000000: 0.759 s
〇 Float32
1000000 0.084 s
10000000 0.934 s
impeding real-time rendering of 3D gaussians...
It seems that radix sort on GPU for Metal is missing. (https://developer.apple.com/forums/thread/105886)
I did find an implementation on Apple Silicon, but it is difficult for me to understand. (https://github.com/ShoYamanishi/AppleNumericalComputing/tree/main/05_radix_sort#54-metal-radix-sort-implementations)
Will there be someone implementing the sorting on GPU to improve the performance? Or is there some library we can directly adopt to accelerate?
The text was updated successfully, but these errors were encountered: