-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge BoringSSL through a905bbb52a7bac5099f2cbee008c6f3eae96218c #1663
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The assembly functions from OpenSSL vary in how the counter overflow works. The aarch64 implementation uses a mix of 32-bit and 64-bit counters. This is because, when packing a block into 64-bit general-purpose registers, it is easier to implement a 64-bit counter than a 32-bit one. Whereas, on 32-bit general-purpose registers, or when using vector registers with 32-bit lanes, it is easier to implement a 32-bit counter. Counters will never overflow with the AEAD, which sets the length limit so it never happens. (Failing to do so will reuse a key/nonce/counter triple.) RFC 8439 is silent on what happens on overflow, so at best one can say it is implicitly undefined behavior. This came about because pyca/cryptography reportedly exposed a ChaCha20 API which encouraged callers to randomize the starting counter. Wrapping with a randomized starting counter isn't inherently wrong, though it is pointless and goes against how the spec recommends using the initial counter value. Nonetheless, we would prefer our functions behave consistently across platforms, rather than silently give ill-defined output given some inputs. So, normalize the behavior to the wrapping version in CRYPTO_chacha_20 by dividing up into multiple ChaCha20_ctr32 calls as needed. Fixed: 614 Change-Id: I191461f25753b9f6b59064c6c08cd4299085e172 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60387 Commit-Queue: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]>
Did 29000 Curve25519 arbitrary point multiplication operations in 1026074us (28263.1 ops/sec) [+31.2%] Change-Id: I9c7d47a047dc68d37202b6cf40d7d12b5b4936f8 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60385 Reviewed-by: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]>
Update-Note: Accessing the DSA struct directly is no longer supported. Use accessors instead. Bug: 325 Change-Id: I73dc20f2e275d48648ca84d2db7806fe155c567d Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60425 Reviewed-by: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]>
Did 75000 Ed25519 key generation operations in 1007110us (74470.5 ops/sec) [+26.9%] Did 72000 Ed25519 signing operations in 1011133us (71207.2 ops/sec) [+25.5%] Did 78000 Curve25519 base-point multiplication operations in 1006737us (77478.0 ops/sec) [+27.5%] Change-Id: I32ca2056f42f9b92af315d8381e1b72be69dd331 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60386 Commit-Queue: Andres Erbsen <[email protected]> Reviewed-by: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]>
.type seems to be needed for the unwind tester to report symbols correctly. I assume it does other useful things. .hidden (.private_extern on macOS) keeps it from being exported out of shared libraries. I'm not positive what .size does, but if ELF wants to know the size of the function, we should probably tell it that. Change-Id: Iaa2e4f8a72e81092ec1d81ee2177504c7dc89c76 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60465 Commit-Queue: Adam Langley <[email protected]> Reviewed-by: Andres Erbsen <[email protected]> Reviewed-by: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]>
OpenSSL 1.1.x renamed these functions with an OPENSSL_ prefix. Unfortunately, rust-openssl uses these, losing type-safety, rather than the type-safe macros. It currently expects the old, unprefixed names due to a different bug (sfackler/rust-openssl#1944), but to fix that, we'll need to align with the OpenSSL names. To keep the current version of rust-openssl working, I've preserved the old names that rust-openssl uses, but we should clear these out. Bug: 499 Change-Id: I3be56a54ef503620b92ce8154fafd46b2906ae63 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60505 Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]>
We have too many build systems and need to unify the source of truth between them. Change-Id: If8a71f400dd478f8890ff56062210295d5e6785e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60545 Commit-Queue: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]>
At the public API, the comparison functions are a double pointer to match qsort. qsort comparators are passed pointers to the list elements, and our list elements are T*. qsort needs this indirection because it can sort values of arbitrary size. However, our type-erased versions have no such constraints. Since we know our all elements are pointers, we can skip the indirection. Change-Id: Ibb102b51a5aaf0a68a7318bf14ec8f4f9c7a3daf Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60506 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]>
OpenSSL's API constraints are such that sk_FOO_sort must take a comparison function of type int (*cmp)(const FOO *const *a, const FOO *const *b) However qsort expects a comparison function of type int (*cmp)(const void *a, const void *b) In C, it is UB to cast a function pointer to a different type and call it, even if the underlying calling conventions are the same. Moreover, as qsort doesn't take a context parameter on its comparisons, we cannot do the usual convention with closures in C. qsort_r and qsort_s would avoid this, but they are unusable. Too many libcs don't have them, and those that do define them inconsistently. See https://stackoverflow.com/a/39561369 It seems this UB has finally hit a sanitizer in fxbug.dev/128274. Irritating as it is to not even have a working sort function, I think the easiest option is to just give up on qsort. As we did with bsearch in https://boringssl-review.googlesource.com/c/boringssl/+/35304, just implement an in-place heap sort ourselves. Bug: fxbug.dev/128274 Change-Id: I9de6b4018bf635da0d0c5a680bd7811d297b0bb3 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60507 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]>
Also teach update_clang.py how to download Chromium's Clang on macOS. We don't currently use it in CI, but I've patched this in enough times for local testing, that having it seems convenient. This picks up -fsanitize=function, though it wouldn't have caught the qsort issue anyway. I'm guessing the libc must be built with UBSan too. Change-Id: I8c396a10b32e69fe7013283b1bb4320bc35756b6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60547 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]>
This fixes one of the two build issues triggered by the curve25519 assembly. The new curve25519 assembly is one file that works for both ELF and Apple targets. This is handy, but breaks the naive concatenation logic for making the generated file lists. Fix the generation to deduplicate first. Convincing Starlark in Bazel to deduplicate is annoyingly difficult, so we'll just generate both lists. This results in some duplication in the generated files, but they're generated so this is fine. Hopefully we can, in time, just remove the per-platform lists when everyone is migrated over. To that end, this CL adds the combined output to GN so I can migrate Chromium. This CL is not sufficient to unbreak the Bazel build. The next change is also needed. Bug: 542 Change-Id: Ibe76ff557fc43f7b4d984ccdf298f13c20f3b50c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60565 Commit-Queue: Adam Langley <[email protected]> Reviewed-by: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]>
This fixes the generated Bazel build. Bazel is strict about having all dependencies declared, which includes files that are #included into other files. (It also is not particularly pleased about textual headers and wants them declared in a separate place.) The new fiat curve25519 assembly is currently split into a BoringSSL half, and a more generic fiat half. For now, just move the BoringSSL customizations directly into the fiat half. This isn't ideal, as we'd, long term, like something where the fiat code can be made standalone. But, to fix the build, just patch in the changes now and we can ponder how to do this better later. (Build tools and conventions for assembly are much less clear than C, sadly.) Also add the .note.GNU-stack bit at the end. Change-Id: I04aa733eabf8562dba42dee63a8fd25c86a59db9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60566 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>
That NULL+0 is forbidden is still an awful language bug in C (fixed in C++), but this particular instance could have been written without pointer arithmetic. While I'm here, tidy pointers a bit: - No need to cast pointers to char* when we're writing to void* anyway - BIO_C_GET_BUF_MEM_PTR is technically a strict aliasing violation. The pointer points to a BUF_MEM*, not a char*, so we should not write to it as a char**. - C casts from void* freely and we've usually omitted the cast in that case. (Though if we ever move libcrypto to C++, that'll all have to change.) Bug: b:286384999 Change-Id: I16d7da675d61f726f259fc9a3cc4a6fce2d6d1fd Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60605 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]>
This probably needs a few iterations, but fix the stuff I noticed on a first pass: - I don't think it's useful to say this is the BoringSSL implementation of something. That's all implicit from this crate anyway. - Rust seems to prefer "Returns ..." rather than "Return ..." - Algorithm names like "hkdf" or "hmac" should be written "HKDF" or "HMAC" when referring to the algorithms. Also BoringSSL is styled "BoringSSL" rather than "boringssl". - Given Rust overall doesn't try to handle allocation failure, let's just write that we don't believe in allocation failure once in the README rather than marking which functions do and don't panic. Change-Id: I48501717dd0b063a2fa4106c4c140d76a7ef69a9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60546 Reviewed-by: Nabil Wadih <[email protected]> Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]>
These bits need more work (and possibly some removal) as they're very, very far from thread-safe, but rust-openssl relies on them being const-correct when targetting OpenSSL 1.1.x. Change-Id: I60531c7e90dbdbcb79c09fc440bd7c6b474172df Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60607 Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>
We added SSL_CIPHER_get_prf_nid to match the other SSL_CIPHER_get_*_nid functions, but OpenSSL went with returning the EVP_MD instead. rust-openssl uses this function, and all the callers of SSL_CIPHER_get_prf_nid then call EVP_get_digestbynid anyway, so this version is preferable. Update-Note: This change is backwards-compatible, but we should update the QUIC code to use this new function when OPENSSL_API_VERSION is high enough. It has the benefit of not pulling in random other hash functions like MD4. Change-Id: Ied66a6f0adbd5d7d86257d9349c40a2830e3c7e8 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60606 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]>
We had to allow this when parsing certs to remain compatible with some misissued certificates, but there's no reason to allow it when making new values. Update-Note: ASN1_UTCTIME_set_string and ASN1_TIME_set_string will no longer accept times with timezone offsets, which is forbidden by RFC 5280. These functions are used when minting new certificates, rather than parsing them. The parsing behavior is unchanged by this CL. Change-Id: I0860deb44a49e99ce477f8cde847d20edfd29ed9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60608 Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]>
rust-openssl uses this function when targetting OpenSSL 1.1.x. Change-Id: Ifeb1b65be9976358f9ee636ed23c1a931e03b275 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60609 Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>
__builtin_ia32_addcarryx_u64 is, strictly speaking, an ADX intrinsic. GCC and newer Clang seem to actually implement it without ADX, but Clang 7 and older will actually try to generate ADX code with it. But since the caller is not marked target("adx"), this fails to build. Manually add ADX and BMI2 target attributes to all these functions. The compiler should be free to use those instructions as these functions all call into an ADX+BMI2 assembly function anyway. (Though it doesn't do much with this.) Note we cannot just annotate fiat_addcarryx_u64. Clang and GCC won't inline across incompatible targets, so if we tag fiat_addcarryx_u64, we need to tag the callers up the chain until we're willing to stop inlining. Change-Id: I855bb88fea666d92997984836e664292d90df5be Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60612 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]>
Three years of updating calling code are finally complete! Update-Note: Accessing the RSA struct directly is no longer supported. Use accessors instead. Bug: 316, 325 Change-Id: I27b4c9899cb96f5807075b8fe351eaf72a9a9d44 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60610 Reviewed-by: Adam Langley <[email protected]> Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Adam Langley <[email protected]>
We no longer need to define CRYPTO_MUTEX in public headers. This simplifies a pile of things. First, we can now use pthread_rwlock_t without any fuss, rather than trying to guess the size on glibc. As a result, CRYPTO_MUTEX and CRYPTO_STATIC_MUTEX can be merged into one type. We can almost do this to CRYPTO_refcount_t too. BIO is the one straggler remaining. Fixed: 325 Change-Id: Ie93c9f553c0f02ce594b959c041b00fc15ba51d2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60611 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]>
Bug: 285222580 Change-Id: I14715c50c3b5b0425443c191f4bf2e3ef7d665ae Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60266 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>
While EVP_PKEY_RSA, EVP_PKEY_DSA, and EVP_PKEY_EC have publicly-exposed internaly representations, other EVP_PKEY types to not. EVP_PKEY_assign should not allow callers to manipulate those representations. As part of this, teach EVP_PKEY_assign_RSA, etc. to find their method tables directly, rather than indirecting through an integer. This makes those EVP APIs static-linker-friendly. Bug: 618, 497 Change-Id: Ic45a7514e9a3adc505759f2327129f13faf03a65 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60645 Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]>
Chromium actually has a checker for unexpected exported symbols, which caught this. Change-Id: If1af4c26a3326eea185725f9e84c17f5d9f0f58e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60665 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]>
I forgot a CPU capability check in X25519Test.NeonABI. Change-Id: Ie2fa4a7b04a7eb152aa3b720687ec529e5dd5b0f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60745 Reviewed-by: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Adam Langley <[email protected]>
bn_mod_lshift_consttime currently calls bn_mod_lshift1_consttime in a loop, but between needing a temporary value and having to guard against some complications in our fixed-width BIGNUM convention, it's actually picking up a lot of overhead. This function is currently called to setup Montgomery contexts with secret moduli (RSA primes). The setup operation is not performance-sensitive in our benchmarks, because it is amortized away in RSA private key signing. However, as part of reducing thread contention with the RSA object, I'm planning to make RSA creation, which we do benchmark, eagerly fill in the Montgomery context. We do benchmark RSA parsing, so adding a slow Montgomery setup would show up in benchmarks. This distinction is mostly artificial. Work done on creation and work done on first use is still work done once per RSA key. However, work done on key creation may slow server startup, while work deferred to first use is amortized but less predictable. Either way, from this CL, and especially the one to follow it, we have plenty of low-hanging fruit in this function. As a bonus, this should help single-use RSA private keys, but that's not something we currently benchmark. Modulus sizes below chosen based on: - Common curve sizes (moot because we use a variable-time setup anyway) - Common RSA modulus sizes (also variable-time setup) - Half of common RSA modulus sizes (the secret primes involved) Of these, only the third category matters. The others can use the division-based path where it's faster anyway. However, by the end of this patch series, they'll get a bit closer, so I benchmarked them all to compare. (Though division still wins in the end.) Benchmarks on an M1 Max: Before: Did 528000 256-bit mont (constime) operations in 2000993us (263869.0 ops/sec) Did 312000 384-bit mont (constime) operations in 2001281us (155900.1 ops/sec) Did 246000 512-bit mont (constime) operations in 2001521us (122906.5 ops/sec) Did 191000 521-bit mont (constime) operations in 2006336us (95198.4 ops/sec) Did 98000 1024-bit mont (constime) operations in 2001438us (48964.8 ops/sec) Did 55000 1536-bit mont (constime) operations in 2025306us (27156.4 ops/sec) Did 35000 2048-bit mont (constime) operations in 2022714us (17303.5 ops/sec) Did 17640 3072-bit mont (constime) operations in 2028352us (8696.7 ops/sec) Did 10290 4096-bit mont (constime) operations in 2065529us (4981.8 ops/sec) After: Did 712000 256-bit mont (constime) operations in 2000454us (355919.2 ops/sec) [+34.9%] Did 440000 384-bit mont (constime) operations in 2001121us (219876.8 ops/sec) [+41.0%] Did 259000 512-bit mont (constime) operations in 2003709us (129260.3 ops/sec) [+5.2%] Did 212000 521-bit mont (constime) operations in 2007033us (105628.6 ops/sec) [+11.0%] Did 107000 1024-bit mont (constime) operations in 2018551us (53008.3 ops/sec) [+8.3%] Did 57000 1536-bit mont (constime) operations in 2001027us (28485.4 ops/sec) [+4.9%] Did 37000 2048-bit mont (constime) operations in 2039631us (18140.5 ops/sec) [+4.8%] Did 20000 3072-bit mont (constime) operations in 2041163us (9798.3 ops/sec) [+12.7%] Did 11760 4096-bit mont (constime) operations in 2007195us (5858.9 ops/sec) [+17.6%] Bug: 316 Change-Id: I06f4a065fdecc1aec3160fe32a41e200538d1ee3 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60685 Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]> Commit-Queue: Adam Langley <[email protected]>
Setting up Montgomery reduction requires computing RR, a larger power of 2 mod N. When N is secret (RSA primes), we currently start at 2^(n_bits-1), then iteratively double and reduce. Instead, once we reach 2R = 2^(r_bits+1) or higher, we can switch to a Montgomery square-and-multiply. (Montgomery reduction only needs n0. RR is just for conversion.) This takes some tuning because, at low powers of 2 (in Montgomery form), it is still more efficient to square by doubling. I ran benchmarks for 32-bit and 64-bit, x86 and Arm, on the machines I had available and picked a threshold that works decently well. (On the hardware I tested, it's the right threshold on all but the Pixel 5A. The 5A would ideally want a slightly higher threshold---it seems to be worse at multiplying or better at addition, but the gap isn't that large, and this operation isn't perf-sensitive anyway.) The result is dramatically faster than the old shift-based approach. That said, see I06f4a065fdecc1aec3160fe32a41e200538d1ee3 for discussion on this operation. These speedups are not expected to translate to increased RSA throughput. They just clear up some initialization work. This speedup is not quite enough to match the division-based variable-time one (perf-sensitive for RSA verification), so we'll keep both codepaths around. M1 Max Before: Did 712000 256-bit mont (constime) operations in 2000454us (355919.2 ops/sec) Did 440000 384-bit mont (constime) operations in 2001121us (219876.8 ops/sec) Did 259000 512-bit mont (constime) operations in 2003709us (129260.3 ops/sec) Did 212000 521-bit mont (constime) operations in 2007033us (105628.6 ops/sec) Did 107000 1024-bit mont (constime) operations in 2018551us (53008.3 ops/sec) Did 57000 1536-bit mont (constime) operations in 2001027us (28485.4 ops/sec) Did 37000 2048-bit mont (constime) operations in 2039631us (18140.5 ops/sec) Did 20000 3072-bit mont (constime) operations in 2041163us (9798.3 ops/sec) Did 11760 4096-bit mont (constime) operations in 2007195us (5858.9 ops/sec) After: Did 3996000 256-bit mont (constime) operations in 2000366us (1997634.4 ops/sec) [+461.3%] Did 2687000 384-bit mont (constime) operations in 2000464us (1343188.4 ops/sec) [+510.9%] Did 2615000 512-bit mont (constime) operations in 2000146us (1307404.6 ops/sec) [+911.5%] Did 1029000 521-bit mont (constime) operations in 2000944us (514257.3 ops/sec) [+386.9%] Did 1246000 1024-bit mont (constime) operations in 2000899us (622720.1 ops/sec) [+1074.8%] Did 688000 1536-bit mont (constime) operations in 2000579us (343900.4 ops/sec) [+1107.3%] Did 425000 2048-bit mont (constime) operations in 2003622us (212115.9 ops/sec) [+1069.3%] Did 212000 3072-bit mont (constime) operations in 2004430us (105765.7 ops/sec) [+979.4%] Did 125000 4096-bit mont (constime) operations in 2009677us (62199.0 ops/sec) [+961.6%] Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz Before: Did 781000 256-bit mont (constime) operations in 2000740us (390355.6 ops/sec) Did 414000 384-bit mont (constime) operations in 2000180us (206981.4 ops/sec) Did 258000 512-bit mont (constime) operations in 2001729us (128888.6 ops/sec) Did 194000 521-bit mont (constime) operations in 2008814us (96574.4 ops/sec) Did 79000 1024-bit mont (constime) operations in 2009309us (39317.0 ops/sec) Did 36000 1536-bit mont (constime) operations in 2003945us (17964.6 ops/sec) Did 21000 2048-bit mont (constime) operations in 2074987us (10120.5 ops/sec) Did 9040 3072-bit mont (constime) operations in 2003869us (4511.3 ops/sec) Did 5250 4096-bit mont (constime) operations in 2067796us (2538.9 ops/sec) After: Did 3496000 256-bit mont (constime) operations in 2000542us (1747526.4 ops/sec) [+347.7%] Did 2466000 384-bit mont (constime) operations in 2000327us (1232798.4 ops/sec) [+495.6%] Did 2392000 512-bit mont (constime) operations in 2000732us (1195562.4 ops/sec) [+827.6%] Did 908000 521-bit mont (constime) operations in 2001181us (453732.1 ops/sec) [+369.8%] Did 1054000 1024-bit mont (constime) operations in 2001429us (526623.7 ops/sec) [+1239.4%] Did 548000 1536-bit mont (constime) operations in 2002417us (273669.3 ops/sec) [+1423.4%] Did 339000 2048-bit mont (constime) operations in 2004127us (169151.0 ops/sec) [+1571.4%] Did 162000 3072-bit mont (constime) operations in 2008221us (80668.4 ops/sec) [+1688.2%] Did 94000 4096-bit mont (constime) operations in 2013848us (46676.8 ops/sec) [+1738.4%] Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 32-bit mode Before: Did 335000 256-bit mont (constime) operations in 2000006us (167499.5 ops/sec) Did 170000 384-bit mont (constime) operations in 2010398us (84560.4 ops/sec) Did 102000 512-bit mont (constime) operations in 2013510us (50657.8 ops/sec) Did 88000 521-bit mont (constime) operations in 2022909us (43501.7 ops/sec) Did 27000 1024-bit mont (constime) operations in 2063490us (13084.6 ops/sec) Did 11760 1536-bit mont (constime) operations in 2000600us (5878.2 ops/sec) Did 6825 2048-bit mont (constime) operations in 2069343us (3298.1 ops/sec) Did 2982 3072-bit mont (constime) operations in 2090651us (1426.3 ops/sec) Did 1680 4096-bit mont (constime) operations in 2074824us (809.7 ops/sec) After: Did 1559000 256-bit mont (constime) operations in 2000884us (779155.6 ops/sec) [+365.2%] Did 940000 384-bit mont (constime) operations in 2001511us (469645.2 ops/sec) [+455.4%] Did 608000 512-bit mont (constime) operations in 2000380us (303942.3 ops/sec) [+500.0%] Did 439000 521-bit mont (constime) operations in 2004282us (219031.1 ops/sec) [+403.5%] Did 180000 1024-bit mont (constime) operations in 2005427us (89756.4 ops/sec) [+586.0%] Did 85000 1536-bit mont (constime) operations in 2017009us (42141.6 ops/sec) [+616.9%] Did 49000 2048-bit mont (constime) operations in 2035401us (24073.9 ops/sec) [+629.9%] Did 22000 3072-bit mont (constime) operations in 2047404us (10745.3 ops/sec) [+653.3%] Did 12642 4096-bit mont (constime) operations in 2094210us (6036.6 ops/sec) [+645.5%] Pixel 5A: Before: Did 483000 256-bit mont (constime) operations in 2001460us (241323.8 ops/sec) Did 279000 384-bit mont (constime) operations in 2004682us (139174.2 ops/sec) Did 198000 512-bit mont (constime) operations in 2003995us (98802.6 ops/sec) Did 141000 521-bit mont (constime) operations in 2006305us (70278.4 ops/sec) Did 62000 1024-bit mont (constime) operations in 2022138us (30660.6 ops/sec) Did 29000 1536-bit mont (constime) operations in 2007150us (14448.3 ops/sec) Did 17376 2048-bit mont (constime) operations in 2044894us (8497.3 ops/sec) Did 7686 3072-bit mont (constime) operations in 2011537us (3821.0 ops/sec) Did 4620 4096-bit mont (constime) operations in 2048780us (2255.0 ops/sec) After: Did 1187000 256-bit mont (constime) operations in 2000099us (593470.6 ops/sec) [+145.9%] Did 794000 384-bit mont (constime) operations in 2002162us (396571.3 ops/sec) [+184.9%] Did 658000 512-bit mont (constime) operations in 2002808us (328538.7 ops/sec) [+232.5%] Did 373000 521-bit mont (constime) operations in 2005135us (186022.4 ops/sec) [+164.7%] Did 231000 1024-bit mont (constime) operations in 2008117us (115033.1 ops/sec) [+275.2%] Did 112000 1536-bit mont (constime) operations in 2003151us (55911.9 ops/sec) [+287.0%] Did 66000 2048-bit mont (constime) operations in 2022295us (32636.2 ops/sec) [+284.1%] Did 30000 3072-bit mont (constime) operations in 2006199us (14953.7 ops/sec) [+291.4%] Did 17182 4096-bit mont (constime) operations in 2017938us (8514.6 ops/sec) [+277.6%] Pixel 5A, 32-bit mode: Before: Did 124000 256-bit mont (constime) operations in 2013082us (61597.1 ops/sec) Did 66000 384-bit mont (constime) operations in 2024604us (32599.0 ops/sec) Did 40000 512-bit mont (constime) operations in 2018560us (19816.1 ops/sec) Did 38000 521-bit mont (constime) operations in 2043776us (18593.0 ops/sec) Did 11466 1024-bit mont (constime) operations in 2010767us (5702.3 ops/sec) Did 5481 1536-bit mont (constime) operations in 2061892us (2658.2 ops/sec) Did 3171 2048-bit mont (constime) operations in 2075359us (1527.9 ops/sec) Did 1407 3072-bit mont (constime) operations in 2032032us (692.4 ops/sec) Did 819 4096-bit mont (constime) operations in 2070367us (395.6 ops/sec) After: Did 718000 256-bit mont (constime) operations in 2000496us (358911.0 ops/sec) [+482.7%] Did 424000 384-bit mont (constime) operations in 2000523us (211944.6 ops/sec) [+550.2%] Did 401000 512-bit mont (constime) operations in 2000933us (200406.5 ops/sec) [+911.3%] Did 205000 521-bit mont (constime) operations in 2004212us (102284.6 ops/sec) [+450.1%] Did 153000 1024-bit mont (constime) operations in 2004644us (76322.8 ops/sec) [+1238.5%] Did 78000 1536-bit mont (constime) operations in 2007510us (38854.1 ops/sec) [+1361.6%] Did 47000 2048-bit mont (constime) operations in 2018015us (23290.2 ops/sec) [+1424.3%] Did 22848 3072-bit mont (constime) operations in 2079082us (10989.5 ops/sec) [+1487.1%] Did 13156 4096-bit mont (constime) operations in 2067424us (6363.5 ops/sec) [+1508.6%] Bug: 316 Change-Id: I402df85170cae780442225eaa879884e707ffa86 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60686 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>
RSA CRT is tiny bit messier when p < q. https://boringssl-review.googlesource.com/25263 solved this by normalizing to p > q. The cost was we sometimes had to compute a new iqmp. Modular inversion is expensive. We did it only once per key, but it's still a performance cliff in per-key costs. When later work moves freeze_private_key into RSA private key parsing, it will be a performance cliff in the private key parser. Instead, just handle p < q in the CRT function. The only difference is needing one extra reduction before the modular subtraction. Even using the fully general mod_montgomery function (as opposed to checking p < q, or using bn_reduce_once when num_bits(p) == num_bits(q)) was not measurable. In doing so, I noticed we didn't actually have tests that exercise the reduction step. I added one to evp_tests.txt, but it is only meaningful when blinding is disabled. (Another cost of blinding.) When blinding is enabled, the answers mod p and q are randomized and we hit this case with about 1.8% probability. See comment in evp_test.txt. I kept the optimization where we store iqmp in Montgomery form, not because the optimization matters, but because we need to store a corrected, fixed-width version of the value anyway, so we may as well store it in a more convenient form. M1 Max Before: Did 9048 RSA 2048 signing operations in 5033403us (1797.6 ops/sec) Did 1500 RSA 4096 signing operations in 5009288us (299.4 ops/sec) After: Did 9116 RSA 2048 signing operations in 5053802us (1803.8 ops/sec) [+0.3%] Did 1500 RSA 4096 signing operations in 5008283us (299.5 ops/sec) [+0.0%] Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz Before: Did 9282 RSA 2048 signing operations in 5019395us (1849.2 ops/sec) Did 1302 RSA 4096 signing operations in 5055011us (257.6 ops/sec) After: Did 9240 RSA 2048 signing operations in 5024845us (1838.9 ops/sec) [-0.6%] Did 1302 RSA 4096 signing operations in 5046157us (258.0 ops/sec) [+0.2%] Bug: 316 Change-Id: Icb90c7d5f5188f9b69a6d7bcc63db13d92ec26d5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60705 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]>
The compiler already knows that .S means assembler and should define __ASSEMBLER__ for us. At least on the CQ, it seems this is not necessary, and some builds are showing a warning about redefining __ASSEMBLER__, which may be caused by this. Change-Id: I390bfcb50cefef42df72cd61ead55159b1838771 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60785 Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]>
This would have made debugging some cross-version test easier. Change-Id: I7b1bc160b5acf40ec02b9ed5ac2d836e3203cf9a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60805 Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]>
The Windows system RNG[1] lives in bcryptprimitives.dll which exports the function ProcessPrng[2] to supply random bytes from its internal generators. These are seeded and reseeded from the operating system using a device connection to \\Device\CNG which is opened when bcryptprimitives.dll is first loaded. After this CL boringssl calls ProcessPrng() directly. Before this CL boringssl got its system randomness (on non-UWP desktop Windows) from calls to RtlGenRandom[3]. This function is undocumented and unsupported, but has always been available by linking to SystemFunction036 in advadpi32.dll. In Windows 10 and later, this export simply forwards to cryptbase.dll!SystemFunction036 which calls ProcessPrng() directly. cryptbase!SystemFunction036 decompiled: ``` BOOLEAN SystemFunction036(PVOID RandomBuffer,ULONG RandomBufferLength) { BOOL retval; retval = ProcessPrng(RandomBuffer,RandomBufferLength); return retval != 0; } ``` Loading cryptbase.dll has the side effect of opening a device handle to \\Device\KsecDD which is not used by boringssl's random number wrappers. Calling ProcessPrng() directly allows sandboxed programs such as Chromium to avoid having this handle if they do not need it. ProcessPrng() also takes a size_t length rather than a u32 length, allowing some simplification of the calling code. After this CL we require bcryptprimitives to be loaded before the first call to CRYPTO_srand(). Applications using the library should either load the module themselves or call CRYPTO_pre_sandbox_init(). Before this CL boringssl required that advapi32, cryptbase and bcryptprimitives were all loaded so this should not represent a breaking change. [1] https://learn.microsoft.com/en-us/windows/win32/seccng/processprng [2] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf [3] https://docs.google.com/document/d/13n1t5ak0yofzcadQCF7Ew5TewSUkNfQ3n-IYodjeRYc/edit Bug: chromium:74242 Change-Id: Ifb1d6ef1a4539ff6e9a2c36cc119b7700ca2be8f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60825 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: David Benjamin <[email protected]>
Trying to migrate Chromium to the "link all the asm files together" strategy broke the aarch64 Android build because some of the ifdef'd out assembly files were missing the .note.gnu.property section for BTI. If we add support for IBT, that'll be another one. To fix this, introduce <openssl/asm_base.h>, which must be included at the start of every assembly file (before the target ifdefs). This does a couple things: - It emits BTI and noexecstack markers into every assembly file, even those that ifdef themselves out. - It resolves the MSan -> OPENSSL_NO_ASM logic, so we only need to do it once. - It defines the same OPENSSL_X86_64, etc., defines we set elsewhere, so we can ensure they're consistent. This required carving files up a bit. <openssl/base.h> has a lot of things, such that trying to guard everything in it on __ASSEMBLER__ would be tedious. Instead, I moved the target defines to a new <openssl/target.h>. Then <openssl/asm_base.h> is the new header that pulls in all those things. Bug: 542 Change-Id: I1682b4d929adea72908655fa1bb15765a6b3473b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60765 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]>
*ring* doesn't (intentionally) overflow any counters.
…+ADX running Linux Add the new code but don't plumb it in yet.
…-point multiplication. Add the code but don't plumb it in.
…rve25519 assembly
…ntgomery reduction.
This was referenced Sep 29, 2023
Codecov Report
@@ Coverage Diff @@
## main #1663 +/- ##
==========================================
+ Coverage 92.07% 92.14% +0.06%
==========================================
Files 132 132
Lines 18921 18909 -12
Branches 199 199
==========================================
+ Hits 17422 17424 +2
+ Misses 1463 1449 -14
Partials 36 36
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
briansmith
force-pushed
the
b/merge-boringssl-17
branch
from
September 29, 2023 21:54
d5e335c
to
00da1cb
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.