You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the standard Go crypto routines, Seal8K runs at 4.0 GB/s, while Sign8K runs at 1.3 GB/s.
With BoringCrypto, Seal8K runs at 3.7 GB/s, while Sign8K runs at 6.0 GB/s. The 4.0 vs 3.7 for Seal8K could be cgo overhead (or not), but the 1.3 vs 6.0 for Sign8K is clearly more than that. Profiling shows that the Go code spends its time in gcmAesData processing 16 bytes at a time, while the BoringCrypto code spends its time in gcm_ghash_avx processing 256 bytes at a time, apparently at a 4X speed win.
Optimizing GHASH is a trivial change to the code, I just didn't know there was a use case for GHASH without GCM. It really was optimized for TLS with 13byte additional authenticated data.
I can easily write a fix if there is demand.
Diff for crypto/aes adding a new benchmark BenchmarkAESGCMSign8K, which is GMAC-like:
On my Core i5 Skylake laptop using amd64:
With the standard Go crypto routines, Seal8K runs at 4.0 GB/s, while Sign8K runs at 1.3 GB/s.
With BoringCrypto, Seal8K runs at 3.7 GB/s, while Sign8K runs at 6.0 GB/s. The 4.0 vs 3.7 for Seal8K could be cgo overhead (or not), but the 1.3 vs 6.0 for Sign8K is clearly more than that. Profiling shows that the Go code spends its time in gcmAesData processing 16 bytes at a time, while the BoringCrypto code spends its time in gcm_ghash_avx processing 256 bytes at a time, apparently at a 4X speed win.
Go should take the 4X speed win too.
/cc @agl @vkrasnov
The text was updated successfully, but these errors were encountered: