LIB: add support for ARM NEON #11

onereddogmedia · 2021-11-21T01:54:23Z

No description provided.

Support for iOS/iPad/Apple Silicon

swesterfeld · 2021-12-05T19:37:57Z

That is an interesting approach to support SIMD optimizations on ARM without changing the existing algorithms. I have a few comments:

As submitted, it doesn't actually optimize anything, because all the existing SSE code checks if __SSE__ is defined. You can see that there are three files that could potentially provide better performance through vectorization:

$ grep -l __SSE__ */*hh */*cc
lib/smnoisedecoder.cc
lib/smpandaresampler.cc
lib/smmath.hh

So at least these __SSE__ checks need to be changed (maybe to something like #if defined (__SSE__) || defined (SM_ARM_SSE) or similar. All in all we have three vectorized algorithms:

NoiseDecoder
fast_vector_sin
PandaResampler

As for any performance optimization, it is probably a good idea to check if the performance is actually improved by the vectorized code. I cannot do this, because I don't have ARM hardware.

For the NoiseDecoder, running tests/testnoiseperf would be interesting. For fast_vector_sin, running tests/testfastsin perf with and without your ARM vectorization code would be interesting.

PandaResampler is special, because its code is in its own repository (https://github.com/swesterfeld/pandaresampler). So since this is a header-only library you'd have to copy-paste the necessary _mm_mul_ps and _mm_add_ps definitions directly into the lib/pandaresampler.cc file, and then run tests/testresampler perf --fpu and compare it with tests/testresampler perf. Later, the SpectMorph PandaResampler would have to be updated by the new pandaresampler.cc.

As for licensing, I've recently changed SpectMorph to LGPL v2.1 or later (c59dd65), so this would be needed for vectorizing NoiseDecoder and fast_vector_sin. PandaResampler is MPL2.0.

If you've implemented all code yourself it would be great if you'd allow it to be used under both licenses. I also see the possibility that your code could be used in Anklang (https://anklang.testbit.eu/) which is also MPL2.0. If you copypasted your code from elsewhere, the original license would also need to be checked.

…ia/spectmorph into feature/arm-support

onereddogmedia · 2021-12-08T08:46:40Z

Thanks for replying. This approach to supporting SIMD on ARM NEON is similar to that of the SIMDe library. The SIMD intrinsics are almost the same between NEON and SSE. Interestingly there's not a huge difference in performance on the Apple M1. I have a subset of the SpectMorph running in Xcode, I'm curious to see the timings on the slower Apple ARM processors as used in iPad, I'll update you.

Sorry, I haven't looked at PandaResampler.

License. I marked it MPL, but it might be MIT. The macros are basically from SIMDe without all their preprocessor flags.

Support for ARM NEON from pull request #11 based on code from Peter Johnson <[email protected]>

#11 based on code from Peter Johnson <[email protected]>

swesterfeld · 2022-01-24T15:22:21Z

I've tried to merge various bits and pieces from your PR into master. It is not a 1:1 merge, I did some changes while merging. Still, I hope you should be able to build from master now, either without changes or with minimal changes. I'd be interested in your feedback on this.

Sorry, I haven't looked at PandaResampler.

Ok, finally I found some time to add NEON support based on your code to PandaResampler. You'll find it in the master branch, with a new performance test. On my devel system (Ryzen 7) SSE is about 3 times faster.

$ tests/testpandaperf 
SSE: 5.57 ns/sample
FPU: 16.61 ns/sample

SSE/FPU speedup: 2.98

I'd be interested in seeing the results of this test using NEON on your M1 system.

onereddogmedia · 2022-02-27T12:05:46Z

Apologies for the late reply. I get the following results MacBook Air (M1, 2020):

$ ./testpandaperf
SSE: 3.76 ns/sample
FPU: 13.55 ns/sample

SSE/FPU speedup: 3.61

onereddogmedia · 2022-05-07T01:17:49Z

Macbook Pro (14-inch, 2021, M1 Pro)

$ ./tests/testpandaperf 
SSE: 3.65 ns/sample
FPU: 13.42 ns/sample

SSE/FPU speedup: 3.68

swesterfeld · 2022-05-24T14:50:54Z

Ok, the results for PandaResampler on M1 look really good. So after all, your code / suggestion to SIMDify stuff using NEON on M1 really improves performance for the resampling code.

As far as I see everything is merged into master and tested now, so I'll close the PR. Thanks!

0x1337c0d3 and others added 2 commits November 21, 2021 12:53

LIB: add support for ARM NEON

d4cdf2c

LIB: add support for ARM NEON

3d50af9

Support for iOS/iPad/Apple Silicon

0x1337c0d3 added 7 commits December 8, 2021 13:11

LIB: update macOS build

089cf05

Merge branch 'feature/arm-support' of https://github.com/onereddogmed…

02d14ba

…ia/spectmorph into feature/arm-support

Merge branch 'master' into feature/arm-support

f38fa6d

LIB: update license to match main lib

6c89790

LIB: improve ARM compatibility

526a036

LIB: include performance test results

602a503

LIB: Intel Mac perf results

1585fe3

LIB: chanege license to MPL

334f9a2

swesterfeld added a commit that referenced this pull request Dec 9, 2021

Merge branch 'arm-sse'

625bc02

Support for ARM NEON from pull request #11 based on code from Peter Johnson <[email protected]>

swesterfeld added a commit that referenced this pull request Dec 12, 2021

MACOS: fix build

103dc34

#11 based on code from Peter Johnson <[email protected]>

swesterfeld closed this May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LIB: add support for ARM NEON #11

LIB: add support for ARM NEON #11

onereddogmedia commented Nov 21, 2021

swesterfeld commented Dec 5, 2021

onereddogmedia commented Dec 8, 2021 •

edited

Loading

swesterfeld commented Jan 24, 2022

onereddogmedia commented Feb 27, 2022

onereddogmedia commented May 7, 2022

swesterfeld commented May 24, 2022

LIB: add support for ARM NEON #11

LIB: add support for ARM NEON #11

Conversation

onereddogmedia commented Nov 21, 2021

swesterfeld commented Dec 5, 2021

onereddogmedia commented Dec 8, 2021 • edited Loading

swesterfeld commented Jan 24, 2022

onereddogmedia commented Feb 27, 2022

onereddogmedia commented May 7, 2022

swesterfeld commented May 24, 2022

onereddogmedia commented Dec 8, 2021 •

edited

Loading