-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIB: add support for ARM NEON #11
LIB: add support for ARM NEON #11
Conversation
Support for iOS/iPad/Apple Silicon
That is an interesting approach to support SIMD optimizations on ARM without changing the existing algorithms. I have a few comments: As submitted, it doesn't actually optimize anything, because all the existing SSE code checks if
So at least these
As for any performance optimization, it is probably a good idea to check if the performance is actually improved by the vectorized code. I cannot do this, because I don't have ARM hardware. For the NoiseDecoder, running PandaResampler is special, because its code is in its own repository (https://github.com/swesterfeld/pandaresampler). So since this is a header-only library you'd have to copy-paste the necessary As for licensing, I've recently changed SpectMorph to LGPL v2.1 or later (c59dd65), so this would be needed for vectorizing If you've implemented all code yourself it would be great if you'd allow it to be used under both licenses. I also see the possibility that your code could be used in Anklang (https://anklang.testbit.eu/) which is also MPL2.0. If you copypasted your code from elsewhere, the original license would also need to be checked. |
…ia/spectmorph into feature/arm-support
Thanks for replying. This approach to supporting SIMD on ARM NEON is similar to that of the SIMDe library. The SIMD intrinsics are almost the same between NEON and SSE. Interestingly there's not a huge difference in performance on the Apple M1. I have a subset of the SpectMorph running in Xcode, I'm curious to see the timings on the slower Apple ARM processors as used in iPad, I'll update you. Sorry, I haven't looked at PandaResampler. License. I marked it MPL, but it might be MIT. The macros are basically from SIMDe without all their preprocessor flags. |
Support for ARM NEON from pull request #11 based on code from Peter Johnson <[email protected]>
#11 based on code from Peter Johnson <[email protected]>
I've tried to merge various bits and pieces from your PR into master. It is not a 1:1 merge, I did some changes while merging. Still, I hope you should be able to build from master now, either without changes or with minimal changes. I'd be interested in your feedback on this.
Ok, finally I found some time to add NEON support based on your code to PandaResampler. You'll find it in the master branch, with a new performance test. On my devel system (Ryzen 7) SSE is about 3 times faster.
I'd be interested in seeing the results of this test using NEON on your M1 system. |
Apologies for the late reply. I get the following results MacBook Air (M1, 2020):
|
Macbook Pro (14-inch, 2021, M1 Pro)
|
Ok, the results for PandaResampler on M1 look really good. So after all, your code / suggestion to SIMDify stuff using NEON on M1 really improves performance for the resampling code. As far as I see everything is merged into master and tested now, so I'll close the PR. Thanks! |
No description provided.