Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any progress or hint on implementing GPU sorting? #39

Open
Yang-Xijie opened this issue Sep 30, 2024 · 2 comments
Open

Any progress or hint on implementing GPU sorting? #39

Yang-Xijie opened this issue Sep 30, 2024 · 2 comments

Comments

@Yang-Xijie
Copy link
Contributor

I find the following sentence in README in commit 8876629:

* Sorting on GPU. Sorting is currently done on the CPU asynchronously at a lower framerate (~10 fps), which increases how often you'll see pops especially when the viewpoint changes quickly

which was removed in commit 2213483:

### TODO

However, it seems sorting still happens on the CPU side:

orderAndDepthTempSort.sort { $0.depth > $1.depth }


The performance of CPU sort is N*log(N). Here are some performance report (on M2 Pro MacBook Pro, release mode, random numbers):

〇 UInt32
1000000: 0.071 s
10000000: 0.759 s
〇 Float32
1000000 0.084 s
10000000 0.934 s

impeding real-time rendering of 3D gaussians...


It seems that radix sort on GPU for Metal is missing. (https://developer.apple.com/forums/thread/105886)

I did find an implementation on Apple Silicon, but it is difficult for me to understand. (https://github.com/ShoYamanishi/AppleNumericalComputing/tree/main/05_radix_sort#54-metal-radix-sort-implementations)

Will there be someone implementing the sorting on GPU to improve the performance? Or is there some library we can directly adopt to accelerate?

@Yang-Xijie
Copy link
Contributor Author

Yang-Xijie commented Sep 30, 2024

@kemchenj
Copy link
Contributor

kemchenj commented Oct 1, 2024

@Yang-Xijie I have some trouble using MLX sort/argSort. So I would say don't.

And actually MetalPerformanceShaderGraph provide some ML operators like sort and argSort that we could use here. They have better performance and stability compared to MLX.

Here is a simple implementation based on MPSGraph https://gist.github.com/kemchenj/26e1dad40e5b89de2828bad36c81302f. Feel free to use it if you got interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants