Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/cipher: NewCTR is too expensive for small runs #39365

Closed
jech opened this issue Jun 2, 2020 · 8 comments
Closed

crypto/cipher: NewCTR is too expensive for small runs #39365

jech opened this issue Jun 2, 2020 · 8 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jech
Copy link

jech commented Jun 2, 2020

Hi,

SRTP, the security mechanism used by WebRTC, encrypts each packet individually with a different IV. Current implementations use AES in CTR mode, with the IV derived from the packet sequence number, the track's identifier (SSRC), and the phase of the moon. The Pion WebRTC library implements this by calling cipher.NewCTR for each packet. This turns out to be one of he main sources of memory allocations in the library.

Pull request pion/srtp#76 works around the problem by implementing a function that performs the equivalent of cipher.NewCTR followed by XORKeyStream without allocating the Stream structure. However, the Pion maintainers appear to believe that the client library is not the right place to perform this kind of manipulation.

I humbly suggest that the library should include a function

func findAGoodName(block cipher.Block, iv []byte, dst, src []byte)

which performs the equivalent of NewCTR followed by XORKeyStream. An alternative would be to add a function

func (ctr *ctr) Reset(block Block, iv []byte)

which resets a Stream to its initial state without reallocating the buffer. Of course, due to backwards compatibility, this would not be added to the existing Stream interface, the client would need to check for it explicitly.

@ianlancetaylor ianlancetaylor changed the title cipher.NewCTR is too expensive for small runs crypto/cipher: NewCTR is too expensive for small runs Jun 2, 2020
@ianlancetaylor
Copy link
Member

CC @FiloSottile

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 2, 2020
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jun 2, 2020
@ernado
Copy link
Contributor

ernado commented Feb 8, 2021

The AES-IGE mode is used in telegram:

cipher, err := aes.NewCipher(key[:]) // allocation
if err != nil {
	return nil, err
}
plaintext := make([]byte, len(data))
d := ige.NewIGEDecrypter(cipher, iv[:])
d.CryptBlocks(plaintext, data)

Telegram uses different iv and key for each message.
We can't reuse Cipher because there are no Reset method.

Should I create another issue for this?

@jech
Copy link
Author

jech commented Aug 18, 2021

@FiloSottile Any chance you could consider this, now that the tree is open again?

@starius
Copy link
Contributor

starius commented Feb 8, 2024

I think, that CTR mode is slow on small runs, because it precalculates 512 bytes:
https://github.com/golang/go/blob/go1.22.0/src/crypto/cipher/ctr.go#L28

Can the size of precalculated buffer be set to BlockSize? Method refill() fills the buffer with BlockSize runs of Encrypt() method anyway. If streamBufferSize is always set to BlockSize, if would improve performance of small runs (e.g. 16 byte) significantly, but should not damage performance of large runs.

@starius
Copy link
Contributor

starius commented Feb 8, 2024

I made an experiment: reduced streamBufferSize from 512 to 32 bytes. It caused ~5% slowdown.

diff --git a/src/crypto/cipher/ctr.go b/src/crypto/cipher/ctr.go
index eac8e266cf..9b93f43c69 100644
--- a/src/crypto/cipher/ctr.go
+++ b/src/crypto/cipher/ctr.go
@@ -25,7 +25,7 @@ type ctr struct {
        outUsed int
 }
 
-const streamBufferSize = 512
+const streamBufferSize = 32
 
 // ctrAble is an interface implemented by ciphers that have a specific optimized
 // implementation of CTR, like crypto/aes. NewCTR will check for this interface

Comparison of benckmarks:

$ benchstat /tmp/old.txt /tmp/new.txt
goos: linux
goarch: amd64
pkg: crypto/cipher
cpu: AMD EPYC Processor (with IBPB)
           │ /tmp/old.txt │            /tmp/new.txt             │
           │    sec/op    │   sec/op     vs base                │
AESCTR1K-3    1.485µ ± 0%   1.559µ ± 1%  +4.98% (p=0.000 n=100)
AESCTR8K-3    11.84µ ± 0%   12.46µ ± 1%  +5.25% (p=0.000 n=100)
geomean       4.193µ        4.407µ       +5.11%

           │ /tmp/old.txt │         /tmp/new.txt         │
           │     B/s      │     B/s       vs base        │
AESCTR1K-3   654.2Mi ± 0%   623.4Mi ± 1%  -4.72% (n=100)
AESCTR8K-3   659.6Mi ± 0%   626.7Mi ± 1%  -4.99% (n=100)
geomean      656.9Mi        625.0Mi       -4.85%

@jech
Copy link
Author

jech commented Jun 11, 2024

Now that 57d0551 has landed, it's fairly easy to work around the issue. Please see https://github.com/pion/srtp/blob/master/crypto.go.

@jech jech closed this as completed Jun 11, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/621958 mentions this issue: crypto/aes: speedup CTR mode on AMD64 and ARM64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/621957 mentions this issue: crypto/cipher: add small CTR benchmark, remove CFB/OFB benchmarks

gopherbot pushed a commit that referenced this issue Nov 18, 2024
CFB and OFB are mostly unused, and not a performance target.

Updates #39365
Updates #69445

Change-Id: Ice6441e4fee2112a9e72607c63e49dbc50441ba6
Reviewed-on: https://go-review.googlesource.com/c/go/+/621957
Reviewed-by: Roland Shoemaker <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
Reviewed-by: Daniel McCarney <[email protected]>
gopherbot pushed a commit that referenced this issue Nov 19, 2024
The implementation runs up to 8 AES instructions in different registers
one after another in ASM code. Because CPU has instruction pipelining
and the instructions do not depend on each other, they can run in
parallel with this layout of code. This results in significant speedup
compared to the regular implementation in which blocks are processed in
the same registers so AES instructions do not run in parallel.

GCM mode already utilizes the approach.

The ASM implementation of ctrAble has most of its code in XORKeyStreamAt
method which has an additional argument, offset. It allows to use it
in a stateless way and to jump to any location in the stream. The method
does not exist in pure Go and boringcrypto implementations.

[ Mailed as CL 413594, then edited by filippo@ to manage the counter
with bits.Add64, remove bounds checks, make the assembly interface more
explicit, and to port the amd64 to Avo. Squeezed another -6.38% out. ]

goos: linux
goarch: amd64
pkg: crypto/cipher
cpu: AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics
            │  19df80d792  │             c8b0409d40              │
            │    sec/op    │   sec/op     vs base                │
AESCTR/50-8    64.68n ± 0%   26.89n ± 0%  -58.42% (p=0.000 n=10)
AESCTR/1K-8   1145.0n ± 0%   135.8n ± 0%  -88.14% (p=0.000 n=10)
AESCTR/8K-8   9145.0n ± 0%   917.5n ± 0%  -89.97% (p=0.000 n=10)
geomean        878.2n        149.6n       -82.96%

            │  19df80d792  │               c8b0409d40               │
            │     B/s      │      B/s       vs base                 │
AESCTR/50-8   737.2Mi ± 0%   1773.3Mi ± 0%  +140.54% (p=0.000 n=10)
AESCTR/1K-8   848.5Mi ± 0%   7156.6Mi ± 0%  +743.40% (p=0.000 n=10)
AESCTR/8K-8   853.8Mi ± 0%   8509.9Mi ± 0%  +896.70% (p=0.000 n=10)
geomean       811.4Mi         4.651Gi       +486.94%

Fixes #20967
Updates #39365
Updates #26673

Co-authored-by: Filippo Valsorda <[email protected]>
Change-Id: Iaeea29fb93a56456f2e54507bc25196edb31b84b
Reviewed-on: https://go-review.googlesource.com/c/go/+/621958
Auto-Submit: Filippo Valsorda <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
Reviewed-by: Daniel McCarney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants