-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/internal/nistec: remove ppc64le assembly #52424
Comments
https://go.dev/cl/404174 is the promised ScalarBaseMult optimization, so it's possible that the assembly is now slower than the fiat-crypto code! |
Change https://go.dev/cl/404174 mentions this issue: |
name old time/op new time/op delta pkg:crypto/ecdsa goos:darwin goarch:amd64 Sign/P224-16 250µs ± 2% 91µs ± 2% -63.42% (p=0.000 n=10+9) Sign/P384-16 955µs ± 3% 311µs ± 2% -67.48% (p=0.000 n=10+10) Sign/P521-16 2.74ms ± 2% 0.82ms ± 2% -69.95% (p=0.000 n=10+10) Verify/P224-16 440µs ± 3% 282µs ± 5% -35.94% (p=0.000 n=9+10) Verify/P384-16 1.72ms ± 2% 1.07ms ± 1% -38.02% (p=0.000 n=10+9) Verify/P521-16 5.10ms ± 2% 3.18ms ± 3% -37.70% (p=0.000 n=10+10) GenerateKey/P224-16 225µs ± 3% 67µs ± 4% -70.42% (p=0.000 n=9+10) GenerateKey/P384-16 881µs ± 1% 241µs ± 2% -72.67% (p=0.000 n=10+10) GenerateKey/P521-16 2.62ms ± 3% 0.69ms ± 3% -73.78% (p=0.000 n=10+9) pkg:crypto/elliptic/internal/nistec goos:darwin goarch:amd64 ScalarMult/P224-16 219µs ± 4% 209µs ± 3% -4.57% (p=0.003 n=10+10) ScalarMult/P384-16 838µs ± 2% 823µs ± 1% -1.72% (p=0.004 n=10+9) ScalarMult/P521-16 2.48ms ± 2% 2.45ms ± 2% ~ (p=0.052 n=10+10) ScalarBaseMult/P224-16 214µs ± 4% 54µs ± 4% -74.88% (p=0.000 n=10+10) ScalarBaseMult/P384-16 828µs ± 2% 196µs ± 3% -76.38% (p=0.000 n=10+10) ScalarBaseMult/P521-16 2.50ms ± 3% 0.55ms ± 2% -77.96% (p=0.000 n=10+10) Updates #52424 For #52182 Change-Id: I2be3c2b8cdeead512063ef489e43805f4ee71d0f Reviewed-on: https://go-review.googlesource.com/c/go/+/404174 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Filippo Valsorda <[email protected]> Reviewed-by: Fernando Lobato Meeser <[email protected]> Reviewed-by: Roland Shoemaker <[email protected]>
Here are comparisons using noasm vs. asm using latest:
No meaningful difference in the crypto/tls benchmarks. |
In #52182 (comment), @laboger reports that the fiat-crypto (#40171) code with @pmur's compiler improvements (https://go.dev/cl/393656) is within range of the assembly performance!
This is extremely impressive considering the fiat-crypto code also uses safer but slower complete formulas and a somewhat naive 4-bit scalar multiplication window.
The ScalarBaseMult benchmark is still significantly slower, because the assembly uses a large precomputed table, while the fiat-crypto code just runs ScalarMult. This is very much fixable.
I will land the ScalarBaseMult optimization in the fiat-crypto code, and then we can remove the ppc64le assembly entirely!
The text was updated successfully, but these errors were encountered: