Make size of output in mpi_mul_hlp() explicit #5701

hanno-becker · 2022-04-06T06:02:35Z

Context: The helper mpi_mul_hlp() performs a multiply-accumulate operation d += s * b, where d,b are MPIs represented as plain integer arrays, and b is a scalar.

Previously, only the length of s was specified, while d was assumed to be 0-terminated of unspecified length.

This was leveraged at the end of the core multiplication steps computing the first limb(s) limbs of d + s*b: Namely, the routine would keep on adding the current carry to d until none was left. This can, in theory, run for an arbitrarily long time if d has a tail of 0xFFs, and hence the assumption of 0-termination.

This solution is both fragile and insecure -- the latter because the carry-loop depends on the result of the multiplication.

Changes: This PR ...

... changes the signature of mpi_mul_hlp() to receive the length of the output buffer, which must be greater or equal to the length of the input buffer.
... renames the function to mbedtls_mpi_core_mla(), exposed in bignum_internal.h for internal usage
... adjusts the p25519 modular reduction to use mbedtls_mpi_core_mla() -- this was not originally intended to be done here, but turned out to be necessary since said reduction routine called mbedtls_mpi_mul_int() with a stack-allocated MPI which caused difficulties with re-growing.

Regarding the new signature of mpi_mul_hlp(), note that it is not assumed that the output buffer is strictly larger than the input buffer -- instead, the routine will simply return any carry that's left. This will be useful in some applications of this function. It is the responsibility of the caller to either size the output appropriately so that no carry will be left, or to handle the carry.

Note: This PR tries to be as minimal as possible. There are a number of instances where one is tempted to try to simplify call-sites of mpi_mul_hlp() further, such as mbedtls_mpi_mul_int() or the conditional subtraction after mpi_montmul(), but this is deliberately not tackled in this PR.

Signed-off-by: Hanno Becker <[email protected]>

The helper `mpi_mul_hlp()` performs a multiply-accumulate operation `d += s * b`, where `d,b` are MPIs and `b` is a scalar. Previously, only the length of `s` was specified, while `d` was assumed to be 0-terminated of unspecified length. This was leveraged at the end of the core multiplication steps computingg the first `limb(s)` limbs of `d + s*b`: Namely, the routine would keep on adding the current carry to `d` until none was left. This can, in theory, run for an arbitrarily long time if `d` has a tail of `0xFF`s, and hence the assumption of `0`-termination. This solution is both fragile and insecure -- the latter because the carry-loop depends on the result of the multiplication. This commit changes the signature of `mpi_mul_hlp()` to receive the length of the output buffer, which must be greater or equal to the length of the input buffer. It is _not_ assumed that the output buffer is strictly larger than the input buffer -- instead, the routine will simply return any carry that's left. This will be useful in some applications of this function. It is the responsibility of the caller to either size the output appropriately so that no carry will be left, or to handle the carry. NOTE: The commit leaves the library in a state where it cannot be compiled since the call-sites of mpi_mul_hlp() have not yet been adjusted. This will be done in the subsequent commits. Signed-off-by: Hanno Becker <[email protected]>

The previous commit has changed the signature of mpi_mul_hlp(), making the length of the output explicit. This commit adjusts the call-site in mbedtls_mpi_mul_mpi() to this new signature. A notable change to the multiplication strategy had to be made: mbedtls_mpi_mul_mpi() performs a simple row-wise schoolbook multiplication, which however was so far computed iterating rows from top to bottom. This leads to the undesirable consequence that as lower rows are calculated and added to the temporary result, carry chains can grow. It is simpler and faster to iterate from bottom to top instead, as it is guaranteed that there will be no carry when adding the next row to the previous temporary result: The length of the output in each iteration can be fixed to len(B)+1. Signed-off-by: Hanno Becker <[email protected]>

A previous commit has changed the signature of mpi_mul_hlp(), making the length of the output explicit. This commit adjusts mbedtls_mpi_mul_int() to this change. Along the way, we make the code simpler and more secure by not calculating the minimal limb-size of A. A previous comment indicated that this was functionally necessary because of the implementation of mpi_mul_hlp() -- if it ever was, it isn't anymore. Signed-off-by: Hanno Becker <[email protected]>

A previous commit has changed the signature of mpi_mul_hlp, making the length of the output explicit. This commit adjusts mpi_montmul() accordingly. It also fixes a comment on the required size of the temporary value passed to mpi_montmul() (but does not change the call-sites). Signed-off-by: Hanno Becker <[email protected]>

hanno-becker · 2022-04-06T08:14:58Z

library/bignum.c

+        (void) mpi_mul_hlp( d, n + 2,
+                            N->p, n,
+                            u1 );
+        d++;
    }

    /* At this point, d is either the desired result or the desired result


The conditional subtraction can be simplified here to

borrow = mpi_sub_hlp( n, d, d, N->p ); borrow = ( d[n] < borrow ); (void) mpi_mul_hlp( d, n, N->p, n, borrow ); memcpy( A->p, d, n * ciL );

which would be an example of a call to mpi_mul_hlp() where you expect and process the carry.

(@yanesca @mpg we touched on this approach in the chat, so FYI).

This paves the way for the helper to be used from the ECP module Signed-off-by: Hanno Becker <[email protected]>

Signed-off-by: Hanno Becker <[email protected]>

hanno-becker · 2022-04-11T06:09:46Z

This PR broke the CI because it calls mbedtls_mpi_grow() in mbedtls_mpi_mul_int() which is called from within the p25519 modular reduction with a stack allocated MPI... a good example for why it will be useful to have a more transparent and leaner API for bignum internally.

To resolve, I removed the use of mbedtls_mpi_mul_int() from the x25519 modular reduction routine, using mpi_mul_hlp() directly instead. @mpg Can you please have a look? I changed the reduction routine slightly to avoid shifting, wrapping at 2^256 == 38 instead of 2^255 == 19.

(In this case, there's nothing to do anyway since we only do a quasi-reduction to N+1 limbs) Signed-off-by: Hanno Becker <[email protected]>

tom-cosgrove-arm · 2022-04-11T07:49:02Z

library/bignum.c

+    mbedtls_mpi_uint c = 0; /* carry */
+
+    /* Remember the excess of d over s for later */
+    d_len -= s_len;


why not just use a separate variable for this, and name it more appropriately?

Sounds good

tom-cosgrove-arm · 2022-04-11T07:49:48Z

library/bignum.c

- * \param i             The number of limbs of \p s.
+ * \param[in,out] d     The bignum to add to.
+ * \param d_len         The number of limbs of \p d. This must be
+ *                      at least \p s_len.


@mpg does Mbed TLS have a mechanism for assert()ing requirements like this? or is it considered not worth it, even from a documentation-in-code perspective?

From all I know, Mbed TLS does not yet have such a mechanism, though it would arguably be useful for debug builds.

No, we don't assert(), for several reasons. One reason is that the code must be able to run in environments that don't have assert(), and we don't want to maintain yet another compilation option HAVE_ASSERT. Another is that we want to zeroize confidential data even if (especially if!) something goes wrong. So if a requirement is violated, the only recourse is to return an error status. But that is not possible in low-level bignum functions that return void.

So far our general approach in bignum is to have public functions that check requirements and return error codes (also those functions may resize objects), and low-level functions that don't check requirements and return void (and those functions may not resize objects).

Another reason not to use assert is that it's typically disabled in production, which is where it's the most useful. If you want a debug-only assert, you can have extra code under MBEDTLS_TEST_HOOKS that should not change the execution of the function, but can mark the test as failed.

tom-cosgrove-arm · 2022-04-11T07:50:01Z

library/bignum.c

-                  const mbedtls_mpi_uint *s,
-                  mbedtls_mpi_uint *d,
-                  mbedtls_mpi_uint b )
+mbedtls_mpi_uint mbedtls_mpi_core_mla( mbedtls_mpi_uint *d, size_t d_len ,


trivial: space before comma

Signed-off-by: Hanno Becker <[email protected]>

tom-cosgrove-arm · 2022-04-11T08:52:04Z

library/ecp_curves.c

-    if( N->n < P255_WIDTH )
+    /* Helper references for top part of N */
+    mbedtls_mpi_uint * const NT_p = N->p + P255_WIDTH;
+    size_t const NT_n = N->n - P255_WIDTH;


trivial: if you're ever editing this file again, its way more common to put const first (this line and the one above)

At least in the line above, moving const to the front would change the semantics to a non-const pointer to constant memory -- to avoid this distinction, I usually put the const last.

If we were starting writing C in a vacuum, I'd agree that placing the const right next to the variable name is more consistent as it always means the same thing here - the variable is constant, regardless of whether it's a pointer or not. However, I agree with Tom that in the existing body of C already written, const size_t foo seems way more common. So, there's a conflict here, unfortunately, between your sensible argument, and existing usage.

That's fine, I'll change it.

The variable is a local variable for the i386 bignum assembly only; introduce it as part of the start/finish macros. It can be noted that the variable is initialize to 0 within MULADDC_INIT, so there are no data dependencies across blocks of MULADDC_INIT/CORE/STOP. Signed-off-by: Hanno Becker <[email protected]>

mpg

Releasing comments from a first pass. I'll make a second pass tomorrow as I'm running out of time for today, and this is non-trivial enough to deserve more attention.

library/bignum.c

library/bignum_internal.h

Signed-off-by: Hanno Becker <[email protected]>

The check was meant to precisely catch an underflow. Signed-off-by: Hanno Becker <[email protected]>

Signed-off-by: Hanno Becker <[email protected]>

This reverts commit 808e666. Signed-off-by: Hanno Becker <[email protected]>

Signed-off-by: Hanno Becker <[email protected]>

mpg · 2022-04-14T10:36:04Z

Ok, I tried running ecc-heap.sh (as extended by #5381) and bad news, it looks like total heap usage and number of allocations are going up after this PR, especially the number of allocations for Brainpool curves (that don't have a dedicated quasi-reduction routine), unless I messed up something in my measurements.

Before: (branch here)

  ECDSA-secp521r1          :    1184 sign/s             2776 heap bytes,    800 allocs
  ECDSA-brainpoolP512r1    :     156 sign/s             2664 heap bytes,  13550 allocs
  ECDSA-secp384r1          :    1668 sign/s             2120 heap bytes,    637 allocs
  ECDSA-brainpoolP384r1    :     394 sign/s             2168 heap bytes,  10141 allocs
  ECDSA-secp256r1          :    2906 sign/s             1560 heap bytes,    540 allocs
  ECDSA-secp256k1          :    2711 sign/s             1616 heap bytes,   2430 allocs
  ECDSA-brainpoolP256r1    :     715 sign/s             1576 heap bytes,   8281 allocs
  ECDSA-secp521r1          :     321 verify/s           8136 heap bytes,   2499 allocs
  ECDSA-brainpoolP512r1    :      40 verify/s           7648 heap bytes,  61774 allocs
  ECDSA-secp384r1          :     464 verify/s           6656 heap bytes,   2004 allocs
  ECDSA-brainpoolP384r1    :      73 verify/s           6264 heap bytes,  46585 allocs
  ECDSA-secp256r1          :     795 verify/s           5128 heap bytes,   1550 allocs
  ECDSA-secp256k1          :     751 verify/s           4744 heap bytes,   8687 allocs
  ECDSA-brainpoolP256r1    :     176 verify/s           5064 heap bytes,  32952 allocs
  ECDHE-secp521r1          :     170 full handshake/s   8736 heap bytes,   4939 allocs
  ECDHE-brainpoolP512r1    :      20 full handshake/s   8200 heap bytes, 123705 allocs
  ECDHE-secp384r1          :     257 full handshake/s   7264 heap bytes,   3937 allocs
  ECDHE-brainpoolP384r1    :      43 full handshake/s   6696 heap bytes,  93332 allocs
  ECDHE-secp256r1          :     449 full handshake/s   5792 heap bytes,   3036 allocs
  ECDHE-secp256k1          :     425 full handshake/s   5256 heap bytes,  17344 allocs
  ECDHE-brainpoolP256r1    :      93 full handshake/s   5280 heap bytes,  66053 allocs
  ECDHE-x25519             :     307 full handshake/s   2608 heap bytes,  12454 allocs
  ECDHE-x448               :     134 full handshake/s   3768 heap bytes,  21722 allocs

After: (branch here)

  ECDSA-secp521r1          :     895 sign/s             4456 heap bytes,    990 allocs
  ECDSA-brainpoolP512r1    :     104 sign/s             4136 heap bytes,  40909 allocs
  ECDSA-secp384r1          :    1365 sign/s             3408 heap bytes,    792 allocs
  ECDSA-brainpoolP384r1    :     272 sign/s             3272 heap bytes,  18772 allocs
  ECDSA-secp256r1          :    2022 sign/s             2440 heap bytes,    638 allocs
  ECDSA-secp256k1          :    2115 sign/s             2504 heap bytes,   2520 allocs
  ECDSA-brainpoolP256r1    :     472 sign/s             2592 heap bytes,  13415 allocs
  ECDSA-secp521r1          :     240 verify/s           8640 heap bytes,   2851 allocs
  ECDSA-brainpoolP512r1    :      25 verify/s           8192 heap bytes, 188211 allocs
  ECDSA-secp384r1          :     386 verify/s           6936 heap bytes,   2266 allocs
  ECDSA-brainpoolP384r1    :      66 verify/s           6624 heap bytes,  86864 allocs
  ECDSA-secp256r1          :     631 verify/s           5536 heap bytes,   1731 allocs
  ECDSA-secp256k1          :     643 verify/s           5184 heap bytes,   8839 allocs
  ECDSA-brainpoolP256r1    :     137 verify/s           5168 heap bytes,  53818 allocs
  ECDHE-secp521r1          :     128 full handshake/s   9792 heap bytes,   5574 allocs
  ECDHE-brainpoolP512r1    :      13 full handshake/s   9376 heap bytes, 377346 allocs
  ECDHE-secp384r1          :     211 full handshake/s   8048 heap bytes,   4404 allocs
  ECDHE-brainpoolP384r1    :      32 full handshake/s   7664 heap bytes, 174067 allocs
  ECDHE-secp256r1          :     343 full handshake/s   6360 heap bytes,   3356 allocs
  ECDHE-secp256k1          :     347 full handshake/s   5912 heap bytes,  17601 allocs
  ECDHE-brainpoolP256r1    :      71 full handshake/s   6072 heap bytes, 107908 allocs
  ECDHE-x25519             :     344 full handshake/s   2608 heap bytes,  12448 allocs
  ECDHE-x448               :     136 full handshake/s   3760 heap bytes,  21720 allocs

Removing the trimming has significant memory impact. While it is clearly what we want to do eventually for constant-time'ness, it should be fixed alongside a strategy to contain the ramifications on memory usage. Signed-off-by: Hanno Becker <[email protected]>

mpg

LGTM

mpg · 2022-04-14T10:59:45Z

Updated results:

  ECDSA-secp521r1          :    1213 sign/s             2776 heap bytes,    800 allocs
  ECDSA-brainpoolP512r1    :     195 sign/s             2664 heap bytes,  13555 allocs
  ECDSA-secp384r1          :    1821 sign/s             2120 heap bytes,    633 allocs
  ECDSA-brainpoolP384r1    :     414 sign/s             2168 heap bytes,  10141 allocs
  ECDSA-secp256r1          :    2778 sign/s             1560 heap bytes,    539 allocs
  ECDSA-secp256k1          :    2705 sign/s             1616 heap bytes,   2432 allocs
  ECDSA-brainpoolP256r1    :     733 sign/s             1576 heap bytes,   8281 allocs
  ECDSA-secp521r1          :     293 verify/s           8136 heap bytes,   2498 allocs
  ECDSA-brainpoolP512r1    :      33 verify/s           7640 heap bytes,  61773 allocs
  ECDSA-secp384r1          :     388 verify/s           6688 heap bytes,   2000 allocs
  ECDSA-brainpoolP384r1    :      87 verify/s           6392 heap bytes,  46587 allocs
  ECDSA-secp256r1          :     797 verify/s           5232 heap bytes,   1552 allocs
  ECDSA-secp256k1          :     767 verify/s           4816 heap bytes,   8690 allocs
  ECDSA-brainpoolP256r1    :     183 verify/s           5024 heap bytes,  32951 allocs
  ECDHE-secp521r1          :     153 full handshake/s   8736 heap bytes,   4938 allocs
  ECDHE-brainpoolP512r1    :      21 full handshake/s   8200 heap bytes, 123698 allocs
  ECDHE-secp384r1          :     245 full handshake/s   7264 heap bytes,   3937 allocs
  ECDHE-brainpoolP384r1    :      43 full handshake/s   6696 heap bytes,  93326 allocs
  ECDHE-secp256r1          :     431 full handshake/s   5792 heap bytes,   3038 allocs
  ECDHE-secp256k1          :     406 full handshake/s   5256 heap bytes,  17341 allocs
  ECDHE-brainpoolP256r1    :      90 full handshake/s   5280 heap bytes,  66055 allocs
  ECDHE-x25519             :     307 full handshake/s   2600 heap bytes,  12451 allocs
  ECDHE-x448               :     129 full handshake/s   3768 heap bytes,  21716 allocs

This fixed the problem!

Hanno Becker added 5 commits April 6, 2022 06:11

Remove unused variable in mpi_mul_hlp()

e7f14a3

Signed-off-by: Hanno Becker <[email protected]>

hanno-becker added component-crypto Crypto primitives and low-level interfaces needs-ci Needs to pass CI tests needs-reviewer This PR needs someone to pick it up for review labels Apr 6, 2022

hanno-becker requested review from tom-cosgrove-arm and mpg April 6, 2022 06:19

hanno-becker commented Apr 6, 2022

View reviewed changes

hanno-becker mentioned this pull request Apr 7, 2022

Simplify structure of bignum inline assembly #5706

Merged

hanno-becker mentioned this pull request Apr 10, 2022

DRAFT: Support double-granularity for Montgomery multiplication #5723

Closed

Hanno Becker added 2 commits April 11, 2022 07:03

Rename mpi_mul_hlp -> mbedtls_mpi_core_mla and expose internally

aef9cc4

This paves the way for the helper to be used from the ECP module Signed-off-by: Hanno Becker <[email protected]>

Simplify x25519 reduction using internal bignum MLA helper

25bb732

Signed-off-by: Hanno Becker <[email protected]>

Safeguard against calling p255 reduction with single-width MPI

6454993

(In this case, there's nothing to do anyway since we only do a quasi-reduction to N+1 limbs) Signed-off-by: Hanno Becker <[email protected]>

tom-cosgrove-arm reviewed Apr 11, 2022

View reviewed changes

Hanno Becker added 3 commits April 11, 2022 09:06

Use size_t for number of limbs

e9dd9a1

Signed-off-by: Hanno Becker <[email protected]>

Address review comments

284d778

Signed-off-by: Hanno Becker <[email protected]>

Remove const qualifier for mutable local variable in mpi_mul_hlp()

5d4ceeb

Signed-off-by: Hanno Becker <[email protected]>

tom-cosgrove-arm reviewed Apr 11, 2022

View reviewed changes

mpg reviewed Apr 11, 2022

View reviewed changes

library/bignum.c Outdated Show resolved Hide resolved

library/bignum.c Outdated Show resolved Hide resolved

library/bignum.c Outdated Show resolved Hide resolved

library/bignum_internal.h Outdated Show resolved Hide resolved

mpg self-requested a review April 11, 2022 11:02

mpg removed the needs-reviewer This PR needs someone to pick it up for review label Apr 11, 2022

Hanno Becker added 2 commits April 12, 2022 11:02

Reduce scope of local variables in mpi_montmul()

0235f75

Signed-off-by: Hanno Becker <[email protected]>

Fix size check in p25519 modular reduction

2ef0cff

The check was meant to precisely catch an underflow. Signed-off-by: Hanno Becker <[email protected]>

hanno-becker dismissed mpg’s stale review via 2ef0cff April 12, 2022 10:02

Hanno Becker added 2 commits April 12, 2022 11:10

Simplify check in p25519 quasi-reduction

d830feb

Signed-off-by: Hanno Becker <[email protected]>

Fix check in p25519 quasi-reduction

bb04cb9

Signed-off-by: Hanno Becker <[email protected]>

tom-cosgrove-arm previously approved these changes Apr 12, 2022

View reviewed changes

Fail gracefully upon unexpectedly large input to p25519 reduction

127fcab

Signed-off-by: Hanno Becker <[email protected]>

hanno-becker dismissed tom-cosgrove-arm’s stale review via 127fcab April 12, 2022 21:19

hanno-becker requested review from tom-cosgrove-arm and mpg April 13, 2022 06:00

Hanno Becker added 3 commits April 13, 2022 07:26

Revert "Don't trim MPIs to minimal size in mbedtls_mpi_mul_mpi()"

da763de

This reverts commit 808e666. Signed-off-by: Hanno Becker <[email protected]>

Reduce the scope of local variable in mbedtls_mpi_mul_mpi()

1772e05

Signed-off-by: Hanno Becker <[email protected]>

Remove unnecessary memory operations in p25519 quasireduction

0dbf04a

Signed-off-by: Hanno Becker <[email protected]>

hanno-becker force-pushed the mpi_mul_hlp branch from 2ede53d to 0dbf04a Compare April 13, 2022 06:27

mpg approved these changes Apr 14, 2022

View reviewed changes

tom-cosgrove-arm approved these changes Apr 14, 2022

View reviewed changes

mpg added approved Design and code approved - may be waiting for CI or backports and removed needs-review Every commit must be reviewed by at least two team members, needs-ci Needs to pass CI tests labels Apr 15, 2022

mpg merged commit 63ed7cb into Mbed-TLS:development Apr 15, 2022

tom-cosgrove-arm mentioned this pull request Apr 27, 2022

Improve inline assembly for Cortex-M + DSP #5360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make size of output in mpi_mul_hlp() explicit #5701

Make size of output in mpi_mul_hlp() explicit #5701

hanno-becker commented Apr 6, 2022 •

edited

Loading

hanno-becker Apr 6, 2022

hanno-becker commented Apr 11, 2022

tom-cosgrove-arm Apr 11, 2022

hanno-becker Apr 11, 2022

tom-cosgrove-arm Apr 11, 2022

hanno-becker Apr 11, 2022

gilles-peskine-arm Apr 11, 2022

gilles-peskine-arm Apr 11, 2022

tom-cosgrove-arm Apr 11, 2022

hanno-becker Apr 11, 2022

tom-cosgrove-arm Apr 11, 2022 •

edited

Loading

hanno-becker Apr 11, 2022 •

edited

Loading

mpg Apr 11, 2022

hanno-becker Apr 11, 2022

mpg left a comment

mpg commented Apr 14, 2022

mpg left a comment

mpg commented Apr 14, 2022

Make size of output in mpi_mul_hlp() explicit #5701

Make size of output in mpi_mul_hlp() explicit #5701

Conversation

hanno-becker commented Apr 6, 2022 • edited Loading

Choose a reason for hiding this comment

hanno-becker commented Apr 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tom-cosgrove-arm Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

hanno-becker Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpg left a comment

Choose a reason for hiding this comment

mpg commented Apr 14, 2022

mpg left a comment

Choose a reason for hiding this comment

mpg commented Apr 14, 2022

hanno-becker commented Apr 6, 2022 •

edited

Loading

tom-cosgrove-arm Apr 11, 2022 •

edited

Loading

hanno-becker Apr 11, 2022 •

edited

Loading