Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SECLIB-667: Accelerate SHA-512 with A64 crypto extensions #5632

Conversation

tom-cosgrove-arm
Copy link
Contributor

@tom-cosgrove-arm tom-cosgrove-arm commented Mar 15, 2022

Description

Provide an additional pair of #defines, MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT and MBEDTLS_SHA512_USE_A64_CRYPTO_ONLY. At most one of them may be specified. If used, it is necessary to compile with -march=armv8.2-a+sha3.

The MBEDTLS_SHA512_PROCESS_ALT and MBEDTLS_SHA512_ALT mechanisms continue to work, and are mutually exclusive with A64_CRYPTO.

There should be minimal code size impact if no A64_CRYPTO option is set.

Note that the Neon-accelerated implementation was originally written by Simon Tatham for PuTTY, under the MIT licence; it is dual-licensed as Apache 2 with his kind permission. I have restructured it to match the Mbed TLS implementation; any mistakes are mine, not his.

Status

READY

Requires Backporting

NO

Additional comments

The coding standards say "a preferred maximum line length of 80 characters" rather than "a maximum line length of 80 characters" - in a very few places I had exceeded 80 characters by a small amount to improve readability (especially where a sequence of consecutive lines are similar).

Steps to test or reproduce

Define either MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT or MBEDTLS_SHA512_USE_A64_CRYPTO_ONLY in the configuration, and compile with -march=armv8.2-a+sha3. *IF_PRESENT will work on systems (such as Raspberry Pi) that don't have the crypto extensions; *ONLY will work on most (other) Aarch64 systems.

@tom-cosgrove-arm tom-cosgrove-arm added mbed TLS team needs-review Every commit must be reviewed by at least two team members, needs-ci Needs to pass CI tests labels Mar 15, 2022
@tom-cosgrove-arm tom-cosgrove-arm requested a review from xkqian March 15, 2022 11:06
@tom-cosgrove-arm tom-cosgrove-arm force-pushed the seclib-667-sha512-acceleration-mbedtls-internal branch 6 times, most recently from 59bd283 to f284765 Compare March 15, 2022 17:33
uint32x4_t sched1 = vld1q_u32( (const uint32_t *)( msg + 16 * 1 ) );
uint32x4_t sched2 = vld1q_u32( (const uint32_t *)( msg + 16 * 2 ) );
uint32x4_t sched3 = vld1q_u32( (const uint32_t *)( msg + 16 * 3 ) );
uint32x4_t sched0 = (uint32x4_t) vld1q_u8( msg + 16 * 0 );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it this way avoids increases required alignment when compiled with -Wcast-align

@sgtatham
Copy link

Note that the Neon-accelerated implementation was originally written by Simon Tatham for PuTTY, under the MIT licence; it is dual-licensed as Apache 2 with his kind permission.

For the record, yes, I confirm that I did give permission for that. Glad it's useful!

@tom-cosgrove-arm tom-cosgrove-arm force-pushed the seclib-667-sha512-acceleration-mbedtls-internal branch 3 times, most recently from 6132fa0 to 2785928 Compare March 16, 2022 15:23
@tom-cosgrove-arm
Copy link
Contributor Author

tom-cosgrove-arm commented Mar 16, 2022

For those wondering: my latest push removed a commit that added a build_neon_intrinsics component to all.sh. Since there aren't any images that could run this component (it needed aarch64 and a gcc >= 8 or clang >= 7), the builds would always fail.

Provide an additional pair of #defines, MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT
and MBEDTLS_SHA512_USE_A64_CRYPTO_ONLY. At most one of them may be
specified. If used, it is necessary to compile with -march=armv8.2-a+sha3.

The MBEDTLS_SHA512_PROCESS_ALT and MBEDTLS_SHA512_ALT mechanisms
continue to work, and are mutually exclusive with SHA512_USE_A64_CRYPTO.

There should be minimal code size impact if no A64_CRYPTO option is set.

The SHA-512 implementation was originally written by Simon Tatham for PuTTY,
under the MIT licence; dual-licensed as Apache 2 with his kind permission.

Signed-off-by: Tom Cosgrove <[email protected]>
Was getting 'dd: unknown operand status'

Signed-off-by: Tom Cosgrove <[email protected]>
@tom-cosgrove-arm tom-cosgrove-arm force-pushed the seclib-667-sha512-acceleration-mbedtls-internal branch from 2785928 to 226aca1 Compare March 23, 2022 21:41
@tom-cosgrove-arm
Copy link
Contributor Author

tom-cosgrove-arm commented Mar 23, 2022

Minor update to the detection of which clang requires inline assembler rather than being able to use the intrinsics: from further testing, it seems that as well as < 13, it's only == 13.0.0. that has this problem - 13.0.1 (what you get when asking for 13.0 on Ubuntu) and 13.1 on macOS have the intrinsics and __ARM_FEATURE_SHA512

(Also rebased onto latest development - if you Compare, you'll just want to look at check_config.h and sha512.c)

@tom-daubney-arm tom-daubney-arm removed the needs-ci Needs to pass CI tests label Mar 24, 2022
Copy link
Contributor

@tom-daubney-arm tom-daubney-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this in the same manner as the SHA-256 PR (#5547).

My review has focused on the following areas:

Style and naming conventions --> all good
Self test --> passes
Unit tests --> passes
Compilation with different combinations of new config options --> Works as expected
Performance gains --> I compiled and ran on an Aarch64 machine and can report that the performance gain is around a 4.4x increase (for both new config options) when compared to not using the CE, which is worthwhile.

LGTM.

P.S. There were a couple of (IMO) missing full stops on some of the multi-line comments and I have highlighted this in my comments. This is so minor that I am not going to let it halt approval and I leave it to @tom-cosgrove-arm to fix if deemed worth it.

* an illegal instruction fault will occur).
*
* \note This allows builds with a smaller code size than with
* MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing full stop. (To keep it consistent with how you have written the rest of the comments.)

#if defined(MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT)
/*
* Capability detection code comes early, so we can disable
* MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT if no detection mechanism found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing full stop

Copy link
Contributor

@xkqian xkqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tom-cosgrove-arm tom-cosgrove-arm added approved Design and code approved - may be waiting for CI or backports and removed needs-review Every commit must be reviewed by at least two team members, labels Mar 29, 2022
@daverodgman daverodgman merged commit 1c41501 into Mbed-TLS:development Mar 29, 2022
@daverodgman
Copy link
Contributor

Fixes #5029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Design and code approved - may be waiting for CI or backports
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants