-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve AES performance #12
Conversation
AES encrypt and decrypt performance are almost maxed out, even though we are 10 times slower than standard crypto: 10ns/op vs 100ns/op in my dev environment. If we generate a cpu profile from BenchmarkAESEncrypt we can see that the cgo call and cgo boilerplate accounts for the 90% of the time, and we can't reduce the cgo calls more. $ go test -run=- -bench=BenchmarkAESEncrypt -memprofile mem.out -cpuprofile cpu.out
$ go tool pprof cpu.out
File: openssl.test
Build ID: 998359c183439631807bc0a0210fe2d55a6d6072
Type: cpu
Time: Feb 22, 2022 at 10:34am (UTC)
Duration: 1.40s, Total samples = 1.28s (91.18%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1.27s, 99.22% of 1.28s total
Showing top 10 nodes out of 15
flat flat% sum% cum cum%
0.80s 62.50% 62.50% 1.23s 96.09% runtime.cgocall
0.15s 11.72% 74.22% 0.22s 17.19% runtime.exitsyscallfast
0.09s 7.03% 81.25% 0.11s 8.59% runtime.casgstatus
0.09s 7.03% 88.28% 0.42s 32.81% runtime.exitsyscall
0.06s 4.69% 92.97% 0.06s 4.69% runtime.wirep
0.02s 1.56% 94.53% 1.27s 99.22% github.com/microsoft/go-crypto-openssl/openssl.(*aesCipher).Encrypt
0.02s 1.56% 96.09% 1.25s 97.66% github.com/microsoft/go-crypto-openssl/openssl._Cfunc_go_openssl_EVP_EncryptUpdate_wrapper
0.02s 1.56% 97.66% 0.02s 1.56% runtime.nanotime (inline)
0.01s 0.78% 98.44% 1.28s 100% github.com/microsoft/go-crypto-openssl/openssl.BenchmarkAESEncrypt
0.01s 0.78% 99.22% 0.01s 0.78% runtime.exitsyscallfast_reacquired |
AES Open and Seal still have room for improvement if we batch 5 cgo calls into one, but it would mean moving logic from Go to C, which I would try to avoid for safety sake. |
We run ~7% faster on go1.18rc1 compared to go1.17.7, which is strange because it is a significant speed burst and go1.18rc1 does not contain any major improvement in linux-amd64 ABI or cgo boilerplate, at least that I know. |
5cd4fbd
to
6eb30c9
Compare
openssl/goopenssl.h
Outdated
@@ -65,11 +68,25 @@ FOR_ALL_OPENSSL_FUNCTIONS | |||
#undef DEFINEFUNC_RENAMED | |||
#undef DEFINEFUNC_FALLBACK | |||
|
|||
// This wrapper allocate out_len on the C stack, and check that it matches the expected | |||
// These wrappers allocate out_len on the C stack, and check that it matches the expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do these wrappers actually check the value was what was expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I took the wrapping idea from the boring branch and just copied the comment as is.
We use these wrappers in functions where we can't know which is the expected output length, so I will reword the comment.
7b9f2b2
to
9196927
Compare
openssl/aes_test.go
Outdated
} | ||
|
||
func BenchmarkAESGCM(b *testing.B) { | ||
for _, length := range []int{64, 1350} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just random "small" and "large", or is it testing both sides of a threshold or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just random small and large, but I will remove the large benchmark because it does not add any information that is not already covered by the small case.
Co-authored-by: Davis Goodin <[email protected]>
744d9d9
to
3a8f831
Compare
This PR contains two complementary AES performance improvements:
Using AES wrappers
Commit: 99e3044
Several AES functions and methods were calling EVP_EncryptUpdate, EVP_DecryptUpdate and EVP_CipherUpdate without using the third parameter,
out_len
, which is an integer pointer filled by OpenSSL with the length of the output buffer.out_len
escapes to the heap, which puts unnecessary garbage collector pressure and degrades performance. To avoid that we are now allocatingout_len
in the C stack and just discarding the result.Declare pointer types in C
Commit: 9356eff
We are storing EVP_CIPHER_CTX and EVP_CIPHER objects as Go pointers even though these are opaque C pointers which Go doesn't need to know they are pointers as it is not creating them nor doing any special operation, everything happens in the C scope. If we store those objects as
void*
typedefs we will have more chances that Go escape analysis will keep them on the stack.A side-effect of this change is that we rely less on types defined in OpenSSL headers, which goes in the right direction. The "north star" is to eliminate the prerequisite of needing OpenSSL development headers in order to build this package.
As a side note, boringcrypto does not suffer so much from these allocations because BoringSSL does not use opaque pointers as function arguments but ordinary structs, which are are easier to allocate in the stack.
Results
AES encryption and decryption no longer allocates and has 70% more throughput.
AES seal and open allocates 75% less and has ~20% more throughput.