-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable 64-bit bignum limbs and add optimized multiplication for Aarch64 #1964
Conversation
@Ko- Thank you for your contribution! In addition, unfortunately our policy is to not accept contributions, without a Contributor’s Licence Agreement (CLA) signed or authorised by yourself or your employer. If this is a personal contribution, the easiest way to do this is if you create an mbed account and accept this click through agreement. Alternatively, you can find a slightly different agreement to sign here, which can be signed and returned to us, and is applicable if you don't want to create an mbed account or alternatively if this is a corporate contribution. Thanks for your understanding and again, thanks for the contribution! |
I signed the CLA yesterday. |
@Ko- Thank you for signing the CLA! Did you sign it with your Mbed account? |
@Ko- Thank you for your your information. As mentioned, we can't link between github account. |
It's andrikos1 I believe |
include/mbedtls/bignum.h
Outdated
@@ -135,9 +135,8 @@ | |||
typedef unsigned int mbedtls_t_udbl __attribute__((mode(TI))); | |||
#define MBEDTLS_HAVE_UDBL | |||
#endif /* !MBEDTLS_NO_UDBL_DIVISION */ | |||
#elif defined(__ARMCC_VERSION) && defined(__aarch64__) | |||
#elif defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah maybe you would actually prefer to add __aarch64__
to the __GNUC__
list because of the __uint128_t
versus __attribute__((mode(TI)))
? I could change that. Although I believe that both GCC and Clang have defined __uint128_t
on 64-bit builds for ages, so I'm not sure if it would matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I have a few questions and comments.
include/mbedtls/bignum.h
Outdated
@@ -135,9 +135,8 @@ | |||
typedef unsigned int mbedtls_t_udbl __attribute__((mode(TI))); | |||
#define MBEDTLS_HAVE_UDBL | |||
#endif /* !MBEDTLS_NO_UDBL_DIVISION */ | |||
#elif defined(__ARMCC_VERSION) && defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add __aarch64__
in the #elif defined(__GNUC__)
section with the other 64-bit architectures? GCC and ARMCC handle MBEDTLS_HAVE_UDBL
differently, so __aarch64__
should be added to the GCC section. GCC shouldn't use the ARMCC section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Changed.
include/mbedtls/bn_mul.h
Outdated
"str x5, [%1], #8 \n\t" | ||
|
||
#define MULADDC_STOP \ | ||
: "+r" (c), "=r" (d), "=r" (s) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the "+r" (c)
when you already have it listed in the input list? Wouldn't =
be okay to use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently yes. I first had =
but then the tests failed. Maybe it would be fine to only have the read-write output register but to not list it separately as an input register. I'm not sure if there's a difference in semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No difference that I'm aware of, although some old compilers might not support +
. Verying between architectures, probably based on who wrote it, we've used =
and repeated the register in both input and output lists, or used +
and avoided repeating the register in the input list.
include/mbedtls/bn_mul.h
Outdated
#define MULADDC_STOP \ | ||
: "+r" (c), "=r" (d), "=r" (s) \ | ||
: "r" (s), "r" (d), "r" (c), "r" (b) \ | ||
: "x4", "x5", "x6", "x7", "cc" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At which optimization levels does this inline assembly work properly at? The set of registers available for use when using -O0
should be enough, but wouldn't hurt to have checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was with -O2
. I now see that it fails with -O0
. I'll investigate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consulted the documentation of the extended asm syntax and pushed a fix. Now both debug and release builds pass the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Needs @yanesca review still
@yanesca Any idea when you have time to review this? |
@Ko- I have a quite long review queue at the moment, so I can't tell when can I get to this PR, but I will review it as soon as I can! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
retest |
This is a totally cool feature, why is it pending still? |
@lnksz Thank you for the reminder! |
Pass information via a key attribute structure rather than as separate parameters to psa_crypto_storage functions. This makes it easier to maintain the code when the metadata of a key evolves. This has negligible impact on code size (+4B with "gcc -Os" on x86_64).
Stored keys must contain lifetime information. The lifetime used to be implied by the location of the key, back when applications supplied the lifetime value when opening the key. Now that all keys' metadata are stored in a central location, this location needs to store the lifetime explicitly.
For a key in a secure element, persist the key slot. This is implemented in the nominal case. Failures may not be handled properly.
Store the persistent data of secure element drivers. This is fully implemented, but not at all tested.
Clarify the documented behavior and lifetime of key handles [v2]
Remove a redundant function call
Remove unused TG variable in mbedtls_mpi_gcd()
The crypto.c implementation of psa_inject_entropy() didn't match the declaration in crypto_extra.h. Use a const seed in both files.
Use the PSA_KEY_POLICY_INIT macro in the definition of PSA_CORE_KEY_ATTRIBUTES_INIT in order to avoid duplicating the key policy initializer.
A macro useful for initializing psa_key_id_t, whether MBEDTLS_PSA_CRYPTO_KEY_FILE_ID_ENCODES_OWNER is set or not. Without this macro, it is necessary to know if MBEDTLS_PSA_CRYPTO_KEY_FILE_ID_ENCODES_OWNER as with it the key ID is non-scalar and needs to be initialized with {0, 0}, and 0 otherwise when key ID is scalar.
Avoid compiler errors when MBEDTLS_PSA_CRYPTO_KEY_FILE_ID_ENCODES_OWNER is set by using the application ID type. [Error] psa_crypto_slot_management.c@175,9: used type 'psa_key_id_t' (aka 'psa_key_file_id_t') where arithmetic or pointer type is required
Please resolve conflicts and ensure testing passes, then we can merge. |
…-legacy-psa-key-derivation Remove legacy psa key derivation
Make fixes related to using Mbed Crypto as a service
Everest integration
According to SP800-90A, the DRBG seeding process should use a nonce of length `security_strength / 2` bits as part of the DRBG seed. It further notes that this nonce may be drawn from the same source of entropy that is used for the first `security_strength` bits of the DRBG seed. The present HMAC DRBG implementation does that, requesting `security_strength * 3 / 2` bits of entropy from the configured entropy source in total to form the initial part of the DRBG seed. However, some entropy sources may have thresholds in terms of how much entropy they can provide in a single call to their entropy gathering function which may be exceeded by the present HMAC DRBG implementation even if the threshold is not smaller than `security_strength` bits. Specifically, this is the case for our own entropy module implementation which only allows requesting at most 32 Bytes of entropy at a time in configurations disabling SHA-512, and this leads to runtime failure of HMAC DRBG when used with Mbed Crypto' own entropy callbacks in such configurations. This commit fixes this by splitting the seed entropy acquisition into two calls, one requesting `security_strength` bits first, and another one requesting `security_strength / 2` bits for the nonce. Fixes Mbed-TLS#237.
…mpatibility_fix-crypto HMAC DRBG: Split entropy-gathering requests to reduce request sizes
GCC and Clang do not define __ARMCC_VERSION when building for Aarch64. Yet they should also use 64-bit limbs for Aarch64 builds.
x0-x3 are skipped such that function parameters to not have to be moved. MULADDC_INIT and MULADDC_STOP are mostly empty because it is more efficient to keep everything in registers (and that should easily be possible). I considered a MULADDC_HUIT implementation, but could not think of something that would be more efficient than basically 8 consecutive MULADDC_CORE. You could combine the loads and stores, but it's probably more efficient to interleave them with arithmetic, depending on the specific microarchitecture. NEON allows to do a 64x64->128 bit multiplication (and optional accumulation) in one instruction, but is not great at handling carries.
This PR should now be made to mbed-crypto instead. I'll create a new one there. |
This fixes #1636 and greatly speeds up
bignum
operations on ARMv8-A / Aarch64 platforms. It still passes all tests.Benchmark (
$ ./programs/test/benchmark
) done on a Pixel XL:Similar improvements were measured on a big Qualcomm server. On a Raspberry Pi3 (Cortex-A53) only enabling 64-bit limbs improved performance.