[3.6] Reduce performance regression in RSA public operations #9281

mpg · 2024-06-18T11:03:19Z

Description

Fix #9232 - partially, the low-hanging fruits (also those that don't make the code size go up by too much).

Status: work in progress.

TODO:

measure code size
measure perf on other platforms
currently the window size is based on the limbsize of E, should it use the actual bitsize instead when E is public?
decide on API
propagate to RSA functions

PR checklist

changelog provided
development TODO
3.6 backport this is it
2.28 backport not required (we are fixing a regression in 3.6)
tests partly provided, will be continued in [3.6] Rsapub additional tests #9493

Attempt to partially solve the performance regression in 3.6.0 without adding too much code size. Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>

mpg · 2024-06-18T11:04:57Z

Initial performance measurements on my laptop (64-bit Intel):

3.6.0

1024/1024-bit exp_mod, cold r: 0.961081 ms
1024/1024-bit exp_mod,  hot r: 1.050500 ms
  17/1024-bit exp_mod, cold r: 0.100309 ms
  17/1024-bit exp_mod,  hot r: 0.177438 ms

This PR

1024/1024-bit exp_mod, cold r: 1.070629 ms
1024/1024-bit exp_mod,  hot r: 1.062271 ms
  17/1024-bit exp_mod, cold r: 0.034083 ms
  17/1024-bit exp_mod,  hot r: 0.027739 ms

3.5.1

1024/1024-bit exp_mod, cold r: 1.071714 ms
1024/1024-bit exp_mod,  hot r: 1.083624 ms
  17/1024-bit exp_mod, cold r: 0.022744 ms
  17/1024-bit exp_mod,  hot r: 0.016366 ms

Edit: code used for benchmarking

yanesca

The approach looks good to me, this should eliminate all the overhead. One opinion about the API: maybe it would be worth to keep the _optionally_safe() function static and have an _unsafe() version in the header instead.

gilles-peskine-arm

Ok (with a few minor remarks) if we really have to have mixed-leakiness functions, but do we really have to do that? I find it harder to follow the code and reason about the function, and that's with a single function, I fear it won't scale to more complex algorithms.

gilles-peskine-arm · 2024-07-02T16:01:02Z

include/mbedtls/bignum.h

+ *  }
+ * not the other way round, in order to prevent misuse. (This is, if a value
+ * other than the two below is passed, default to the safe path.) */
+#define MBEDTLS_MPI_IS_PUBLIC  0x2a2a


Some constants can be loaded with fewer instructions than others. We should pick a constant that's short to load on Thumb. I think 0x2a2a can be improved on. This page has some guidance.

0x2a2a is a 16 bit value and can be loaded with a movw. That supposed to be a single cycle, how can we improve on that?

@yanesca That's new in v7. And on second thoughts that blog post only covers 32-bit Arm instructions, but when thinking of code size, we care about Thumb instructions. So this is more relevant:

In Thumb instructions, constant can be:

any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32-bit word

any constant of the form 0x00XY00XY

any constant of the form 0xXY00XY00

any constant of the form 0xXYXYXYXY.

That covers 0x002a002a and 0x2a2a2a2a and 0x0000002a and 0x00002a00 and more but not 0x00002a2a.

Just to confirm that for Thumb2 the comparisons are encoded as single instructions, e.g.

16: f1b1 3f2a cmp.w r1, #707406378 ; 0x2a2a2a2a

include/mbedtls/bignum.h

gilles-peskine-arm · 2024-07-05T11:04:53Z

include/mbedtls/bignum.h

+ * \return         Another negative error code on different kinds of failures.
+ *
+ */
+int mbedtls_mpi_exp_mod_optionally_safe(mbedtls_mpi *X, const mbedtls_mpi *A,


If we're going to have a single function for both cases, I would prefer a more specific naming convention than optionally_safe. I don't like the use of “safe”, which could refer to many different things. I'd prefer leaky or “public”.

gilles-peskine-arm · 2024-07-05T11:07:54Z

library/bignum_core.h

+ *                   It is up to the caller to zeroize \p T when it is no
+ *                   longer needed, and before freeing it if it was dynamically
+ *                   allocated.
+ * \param[in] E_public Set to MBEDTLS_MPI_IS_PUBLIC to gain some performance


I don't like the complexity of having functions whose security properties depend on a runtime argument. Neither P256-m nor BearSSL do that. I would prefer to have separate code paths for leaky and non-leaky, converging only on functions that have a single implementation either way. The only argument I can think of to merge the two is code size. Is the gain really worth it?

I am not happy about it either, but I think now that we are turning on USE_PSA_CRYPTO by default, we will be pretty desperate for code size savings for a while. I think the gain can worth it and we should try.

mpg · 2024-07-22T11:05:31Z

Ok (with a few minor remarks) if we really have to have mixed-leakiness functions, but do we really have to do that?

No, I don't think we absolutely have to do that, and that's one of the main points about which I was hoping to get feedback. I just has to make a temporary choice for prototyping.

One opinion about the API: maybe it would be worth to keep the _optionally_safe() function static and have an _unsafe() version in the header instead.

Yes, I think that would already be better.

I don't like the complexity of having functions whose security properties depend on a runtime argument.

Agreed. Do you think it's acceptable for static functions though or would you rather avoid it entirely?

The only argument I can think of to merge the two is code size. Is the gain really worth it?

That, and also there can be a maintenance cost if we end up duplicating source code. I can give it a try to do some size measurements and check how easy or awkward it is to avoid duplication.

gilles-peskine-arm · 2024-07-22T11:08:00Z

I don't like the complexity of having functions whose security properties depend on a runtime argument.

Agreed. Do you think it's acceptable for static functions though or would you rather avoid it entirely?

I'd prefer to avoid it, but for smaller functions with a limited scope, it's not so bad.

jforissier · 2024-07-23T13:13:25Z

With this PR (and the necessary changes to make use of the E_public parameter), the OP-TEE test case runs faster although still much slower than with v3.5.2. This is because the test doesn't only run RSA signature verification and decryption, it also includes RSA signing and encryption. Assuming OP-TEE switches to the more secure algorithm for secret stuff, we may need to disable this test in the CI and run it only during the release tests.
Here are the numbers (running time xtest 4011 on the arm32 QEMU target):

v3.5.2 (optee_os master @ b339ffbd9): 14.5s
v3.6.0: 8m 26.6s
This PR (my branch: test-mbedtls-expmod-regression-fix-9281 @ 9dca1658f): 5m 58s

mpg · 2024-08-06T10:56:33Z

Just checking: @gilles-peskine-arm @yanesca will you be volunteering to review this once I've updated it, or should I look for another reviewer? (I'd appreciate if at least one of you could give at least a design review.)

gilles-peskine-arm · 2024-08-06T11:01:55Z

I intend to review if I'm around (I'll take about a week off around 15 August).

yanesca · 2024-08-06T11:36:08Z

I am happy to review it as well, but I will be taking some days off too (this Friday and next Wednesday).

Signed-off-by: Janos Follath <[email protected]>

The complexity of having functions whose security properties depend on a runtime argument can be dangerous. Limit misuse by making any such functions local. Signed-off-by: Janos Follath <[email protected]>

The complexity of having functions whose security properties depend on a runtime argument can be dangerous. Limit risk by isolating such code in small functions with limited scope. Signed-off-by: Janos Follath <[email protected]>

These macros are not part of any public or internal API, ideally they would be defined in the source files. The reason to put them in bignum_core.h to avoid duplication as macros for this purpose are needed in both bignum.c and bignum_core.c. Signed-off-by: Janos Follath <[email protected]>

In Thumb instructions, constant can be: - any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32-bit word - any constant of the form 0x00XY00XY - any constant of the form 0xXY00XY00 - any constant of the form 0xXYXYXYXY. Signed-off-by: Janos Follath <[email protected]>

yanesca · 2024-08-12T19:22:21Z

There is still work needed, just pushing what I have so far to get an early CI overnight.

It is easier to read if the parameter controlling constant timeness with respect to a parameter is next to that parameter. Signed-off-by: Janos Follath <[email protected]>

The allocated size can be significantly larger than the actual size. In the unsafe case we can use the actual size and gain some performance. Signed-off-by: Janos Follath <[email protected]>

The new test hooks allow to check whether there was an unsafe call of an optionally safe function in the codepath. For the sake of simplicity the MBEDTLS_MPI_IS_* macros are reused for signalling safe/unsafe codepaths here too. Signed-off-by: Janos Follath <[email protected]>

ChangeLog.d/fix-rsa-performance-regression.txt

Signed-off-by: Janos Follath <[email protected]>

yanesca · 2024-08-22T12:01:18Z

Reverted to 878af12.

library/rsa.c

library/bignum.c

gilles-peskine-arm

I approve the library changes for the release freeze today, after fixing the CI (plus the changelog entry while you're at it). I haven't reviewed the tests, but I'm satisfied that they're non-harmful.

This is conditional on doing the following as soon as possible, in any case before the end of the quarter:

Have the tests reviewed and approved (by me or someone else).
Improving the test hooks
Make initialization robust

ChangeLog.d/fix-rsa-performance-regression.txt

library/bignum.c

Co-authored-by: Gilles Peskine <[email protected]> Signed-off-by: Janos Follath <[email protected]>

To silence no previous prototype warnings. And this is the proper way to do it anyway. Signed-off-by: Janos Follath <[email protected]>

gilles-peskine-arm

I approve the library changes at 5f31697 for the release freeze today. I haven't reviewed the tests, but I'm satisfied that they're non-harmful.

This is conditional on doing the following as soon as possible, in any case before the end of the quarter:

Have the tests reviewed and approved (by me or someone else).
Improving the test hooks
Make initialization robust

ChangeLog.d/fix-rsa-performance-regression.txt

library/bignum_internal.h

Signed-off-by: Janos Follath <[email protected]>

tom-cosgrove-arm

LGTM

gilles-peskine-arm

LGTM with the reservations in #9281 (review)

gilles-peskine-arm · 2024-08-22T15:31:37Z

I've pressed the merge button so that we can get on with the release. We need a forward-port to development, but that doesn't go through the release process so it can wait a few days.

Add optionally unsafe variant of exp_mod for perf

75ed587

Attempt to partially solve the performance regression in 3.6.0 without adding too much code size. Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>

mpg self-assigned this Jun 18, 2024

mpg changed the base branch from development to mbedtls-3.6 June 18, 2024 11:03

mpg mentioned this pull request Jun 18, 2024

Performance regression in mbedtls_mpi_exp_mod() (v3.6.0) #9232

Closed

yanesca reviewed Jun 18, 2024

View reviewed changes

gilles-peskine-arm requested changes Jul 5, 2024

View reviewed changes

mpg added priority-very-high Highest priority - prioritise this over other review work bug needs-work labels Aug 9, 2024

mpg marked this pull request as ready for review August 9, 2024 10:28

mpg linked an issue Aug 9, 2024 that may be closed by this pull request

Performance regression in mbedtls_mpi_exp_mod() (v3.6.0) #9232

Closed

yanesca added 5 commits August 12, 2024 20:02

Improve documentation of MBEDTLS_MPI_IS_PUBLIC

e084964

Signed-off-by: Janos Follath <[email protected]>

Make _optionally_safe functions internal

38ff70e

The complexity of having functions whose security properties depend on a runtime argument can be dangerous. Limit misuse by making any such functions local. Signed-off-by: Janos Follath <[email protected]>

Move mixed security code to small local functions

bb3f295

The complexity of having functions whose security properties depend on a runtime argument can be dangerous. Limit risk by isolating such code in small functions with limited scope. Signed-off-by: Janos Follath <[email protected]>

yanesca self-assigned this Aug 12, 2024

yanesca requested a review from tom-cosgrove-arm August 12, 2024 19:17

Move _public parameters next to their target

a5fc8f3

It is easier to read if the parameter controlling constant timeness with respect to a parameter is next to that parameter. Signed-off-by: Janos Follath <[email protected]>

yanesca force-pushed the rsapub branch from a1a11df to a5fc8f3 Compare August 13, 2024 06:42

yanesca added 2 commits August 13, 2024 07:53

Use actual exponent size for window calculation

020b9ab

The allocated size can be significantly larger than the actual size. In the unsafe case we can use the actual size and gain some performance. Signed-off-by: Janos Follath <[email protected]>

tom-cosgrove-arm reviewed Aug 22, 2024

View reviewed changes

ChangeLog.d/fix-rsa-performance-regression.txt Outdated Show resolved Hide resolved

yanesca force-pushed the rsapub branch from 9bf243e to 878af12 Compare August 22, 2024 11:53

yanesca added 2 commits August 22, 2024 12:59

Add changelog

6c20869

Signed-off-by: Janos Follath <[email protected]>

Make mbedtls_mpi_exp_mod_unsafe internal

82976f3

Signed-off-by: Janos Follath <[email protected]>

yanesca removed the needs-work label Aug 22, 2024

tom-cosgrove-arm reviewed Aug 22, 2024

View reviewed changes

library/rsa.c Outdated Show resolved Hide resolved

tom-cosgrove-arm reviewed Aug 22, 2024

View reviewed changes

library/bignum.c Show resolved Hide resolved

gilles-peskine-arm requested changes Aug 22, 2024

View reviewed changes

ChangeLog.d/fix-rsa-performance-regression.txt Outdated Show resolved Hide resolved

library/bignum.c Show resolved Hide resolved

yanesca and others added 2 commits August 22, 2024 14:49

Improve ChangeLog

5d16334

Co-authored-by: Gilles Peskine <[email protected]> Signed-off-by: Janos Follath <[email protected]>

Add header for mbedtls_mpi_exp_mod_unsafe()

5f31697

To silence no previous prototype warnings. And this is the proper way to do it anyway. Signed-off-by: Janos Follath <[email protected]>

gilles-peskine-arm previously approved these changes Aug 22, 2024

View reviewed changes

tom-cosgrove-arm reviewed Aug 22, 2024

View reviewed changes

ChangeLog.d/fix-rsa-performance-regression.txt Outdated Show resolved Hide resolved

tom-cosgrove-arm reviewed Aug 22, 2024

View reviewed changes

library/bignum_internal.h Show resolved Hide resolved

Fix Changelog formatting

4c857c4

Signed-off-by: Janos Follath <[email protected]>

yanesca dismissed gilles-peskine-arm’s stale review via 4c857c4 August 22, 2024 14:49

tom-cosgrove-arm approved these changes Aug 22, 2024

View reviewed changes

gilles-peskine-arm approved these changes Aug 22, 2024

View reviewed changes

gilles-peskine-arm enabled auto-merge August 22, 2024 15:30

gilles-peskine-arm added this pull request to the merge queue Aug 22, 2024

Merged via the queue into Mbed-TLS:mbedtls-3.6 with commit df0ef8a Aug 22, 2024
4 of 6 checks passed

gilles-peskine-arm mentioned this pull request Aug 29, 2024

ssl_client1 fails on TLS 1.3 #9072

Closed

mpg mentioned this pull request Sep 4, 2024

[dev] Rsapub performance fix #9536

Merged

5 tasks

jforissier mentioned this pull request Nov 19, 2024

Import/mbedtls 3.6.2 OP-TEE/optee_os#7135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.6] Reduce performance regression in RSA public operations #9281

[3.6] Reduce performance regression in RSA public operations #9281

mpg commented Jun 18, 2024 •

edited by gilles-peskine-arm

Loading

mpg commented Jun 18, 2024 •

edited

Loading

yanesca left a comment •

edited

Loading

gilles-peskine-arm left a comment

gilles-peskine-arm Jul 2, 2024

yanesca Aug 12, 2024

gilles-peskine-arm Aug 12, 2024

tom-cosgrove-arm Aug 21, 2024

gilles-peskine-arm Jul 5, 2024

gilles-peskine-arm Jul 5, 2024

yanesca Aug 12, 2024

mpg commented Jul 22, 2024

gilles-peskine-arm commented Jul 22, 2024

jforissier commented Jul 23, 2024

mpg commented Aug 6, 2024

gilles-peskine-arm commented Aug 6, 2024

yanesca commented Aug 6, 2024

yanesca commented Aug 12, 2024

yanesca commented Aug 22, 2024

gilles-peskine-arm left a comment

gilles-peskine-arm left a comment

tom-cosgrove-arm left a comment

gilles-peskine-arm left a comment •

edited

Loading

gilles-peskine-arm commented Aug 22, 2024

[3.6] Reduce performance regression in RSA public operations #9281

[3.6] Reduce performance regression in RSA public operations #9281

Conversation

mpg commented Jun 18, 2024 • edited by gilles-peskine-arm Loading

Description

PR checklist

mpg commented Jun 18, 2024 • edited Loading

yanesca left a comment • edited Loading

Choose a reason for hiding this comment

gilles-peskine-arm left a comment

Choose a reason for hiding this comment

gilles-peskine-arm Jul 2, 2024

Choose a reason for hiding this comment

yanesca Aug 12, 2024

Choose a reason for hiding this comment

gilles-peskine-arm Aug 12, 2024

Choose a reason for hiding this comment

tom-cosgrove-arm Aug 21, 2024

Choose a reason for hiding this comment

gilles-peskine-arm Jul 5, 2024

Choose a reason for hiding this comment

gilles-peskine-arm Jul 5, 2024

Choose a reason for hiding this comment

yanesca Aug 12, 2024

Choose a reason for hiding this comment

mpg commented Jul 22, 2024

gilles-peskine-arm commented Jul 22, 2024

jforissier commented Jul 23, 2024

mpg commented Aug 6, 2024

gilles-peskine-arm commented Aug 6, 2024

yanesca commented Aug 6, 2024

yanesca commented Aug 12, 2024

yanesca commented Aug 22, 2024

gilles-peskine-arm left a comment

Choose a reason for hiding this comment

gilles-peskine-arm left a comment

Choose a reason for hiding this comment

tom-cosgrove-arm left a comment

Choose a reason for hiding this comment

gilles-peskine-arm left a comment • edited Loading

Choose a reason for hiding this comment

gilles-peskine-arm commented Aug 22, 2024

mpg commented Jun 18, 2024 •

edited by gilles-peskine-arm

Loading

mpg commented Jun 18, 2024 •

edited

Loading

yanesca left a comment •

edited

Loading

gilles-peskine-arm left a comment •

edited

Loading