-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/cipher: NewCTR is too expensive for small runs #39365
Comments
CC @FiloSottile |
The AES-IGE mode is used in telegram: cipher, err := aes.NewCipher(key[:]) // allocation
if err != nil {
return nil, err
}
plaintext := make([]byte, len(data))
d := ige.NewIGEDecrypter(cipher, iv[:])
d.CryptBlocks(plaintext, data) Telegram uses different iv and key for each message. Should I create another issue for this? |
@FiloSottile Any chance you could consider this, now that the tree is open again? |
I think, that CTR mode is slow on small runs, because it precalculates 512 bytes: Can the size of precalculated buffer be set to BlockSize? Method |
I made an experiment: reduced diff --git a/src/crypto/cipher/ctr.go b/src/crypto/cipher/ctr.go
index eac8e266cf..9b93f43c69 100644
--- a/src/crypto/cipher/ctr.go
+++ b/src/crypto/cipher/ctr.go
@@ -25,7 +25,7 @@ type ctr struct {
outUsed int
}
-const streamBufferSize = 512
+const streamBufferSize = 32
// ctrAble is an interface implemented by ciphers that have a specific optimized
// implementation of CTR, like crypto/aes. NewCTR will check for this interface Comparison of benckmarks:
|
Now that 57d0551 has landed, it's fairly easy to work around the issue. Please see https://github.com/pion/srtp/blob/master/crypto.go. |
Change https://go.dev/cl/621958 mentions this issue: |
Change https://go.dev/cl/621957 mentions this issue: |
CFB and OFB are mostly unused, and not a performance target. Updates #39365 Updates #69445 Change-Id: Ice6441e4fee2112a9e72607c63e49dbc50441ba6 Reviewed-on: https://go-review.googlesource.com/c/go/+/621957 Reviewed-by: Roland Shoemaker <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Daniel McCarney <[email protected]>
The implementation runs up to 8 AES instructions in different registers one after another in ASM code. Because CPU has instruction pipelining and the instructions do not depend on each other, they can run in parallel with this layout of code. This results in significant speedup compared to the regular implementation in which blocks are processed in the same registers so AES instructions do not run in parallel. GCM mode already utilizes the approach. The ASM implementation of ctrAble has most of its code in XORKeyStreamAt method which has an additional argument, offset. It allows to use it in a stateless way and to jump to any location in the stream. The method does not exist in pure Go and boringcrypto implementations. [ Mailed as CL 413594, then edited by filippo@ to manage the counter with bits.Add64, remove bounds checks, make the assembly interface more explicit, and to port the amd64 to Avo. Squeezed another -6.38% out. ] goos: linux goarch: amd64 pkg: crypto/cipher cpu: AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics │ 19df80d792 │ c8b0409d40 │ │ sec/op │ sec/op vs base │ AESCTR/50-8 64.68n ± 0% 26.89n ± 0% -58.42% (p=0.000 n=10) AESCTR/1K-8 1145.0n ± 0% 135.8n ± 0% -88.14% (p=0.000 n=10) AESCTR/8K-8 9145.0n ± 0% 917.5n ± 0% -89.97% (p=0.000 n=10) geomean 878.2n 149.6n -82.96% │ 19df80d792 │ c8b0409d40 │ │ B/s │ B/s vs base │ AESCTR/50-8 737.2Mi ± 0% 1773.3Mi ± 0% +140.54% (p=0.000 n=10) AESCTR/1K-8 848.5Mi ± 0% 7156.6Mi ± 0% +743.40% (p=0.000 n=10) AESCTR/8K-8 853.8Mi ± 0% 8509.9Mi ± 0% +896.70% (p=0.000 n=10) geomean 811.4Mi 4.651Gi +486.94% Fixes #20967 Updates #39365 Updates #26673 Co-authored-by: Filippo Valsorda <[email protected]> Change-Id: Iaeea29fb93a56456f2e54507bc25196edb31b84b Reviewed-on: https://go-review.googlesource.com/c/go/+/621958 Auto-Submit: Filippo Valsorda <[email protected]> Reviewed-by: Roland Shoemaker <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Daniel McCarney <[email protected]>
Hi,
SRTP, the security mechanism used by WebRTC, encrypts each packet individually with a different IV. Current implementations use AES in CTR mode, with the IV derived from the packet sequence number, the track's identifier (SSRC), and the phase of the moon. The Pion WebRTC library implements this by calling
cipher.NewCTR
for each packet. This turns out to be one of he main sources of memory allocations in the library.Pull request pion/srtp#76 works around the problem by implementing a function that performs the equivalent of
cipher.NewCTR
followed byXORKeyStream
without allocating the Stream structure. However, the Pion maintainers appear to believe that the client library is not the right place to perform this kind of manipulation.I humbly suggest that the library should include a function
which performs the equivalent of
NewCTR
followed byXORKeyStream
. An alternative would be to add a functionwhich resets a
Stream
to its initial state without reallocating the buffer. Of course, due to backwards compatibility, this would not be added to the existingStream
interface, the client would need to check for it explicitly.The text was updated successfully, but these errors were encountered: