Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poly1305 ARM32 NEON: add implementation #8344

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SparkiDev
Copy link
Contributor

Description

Add assembly for Poly1305 using ARM32 NEON instruction set.

For Poly1305 ARM32 Base:
Change name from poly1305_blocks_arm32_16 to poly1305_arm32_blocks_16
Update location of fields in Poly1305 objects

poly1305.c:
ARM32 NEON - buffer up to 4 blocks
x86_64 - only calculate powers of r once after key is set.
test.c: poly1305 testing with multiple updates.
benchmark: chacha20-poly1305 now uses AAD

Testing

Tested Poly1305 for all ARM32 CPU types supported.

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this Jan 9, 2025
@SparkiDev SparkiDev force-pushed the poly1305_arm32_neon branch 2 times, most recently from f61525c to 66b9c3a Compare January 9, 2025 02:33
@SparkiDev SparkiDev assigned wolfSSL-Bot and unassigned SparkiDev Jan 9, 2025
@SparkiDev SparkiDev requested a review from wolfSSL-Bot January 9, 2025 03:11
@SparkiDev SparkiDev force-pushed the poly1305_arm32_neon branch from 66b9c3a to c2b610a Compare January 9, 2025 03:36
Add assembly for Poly1305 using ARM32 NEON instruction set.

For Poly1305 ARM32 Base:
  Change name from poly1305_blocks_arm32_16 to poly1305_arm32_blocks_16

poly1305.c:
  ARM32 NEON - buffer up to 4 blocks
  x86_64 - only calculate powers of r once after key is set.
test.c: poly1305 testing with multiple updates.
benchmark: chacha20-poly1305 now uses AAD
@SparkiDev SparkiDev force-pushed the poly1305_arm32_neon branch from c2b610a to 2f605f5 Compare January 9, 2025 07:15
@dgarske dgarske self-requested a review January 9, 2025 16:30
Copy link
Contributor

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm if these are actually used and if so please add inline comment.

word32 r_21[10];
word32 r_43[10];

I ran benchmarks on the Pi5 with --enable-armasm. No change in performance. I will also test this on an ARM32 board like the STM32MP135.

word32 pad[4];
word32 leftover;
#if !defined(WOLFSSL_ARMASM_THUMB2) && !defined(WOLFSSL_ARMASM_NO_NEON)
unsigned char buffer[4*POLY1305_BLOCK_SIZE];
word32 r_21[10];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these for and are they actually used? If so please add an inline code comment.

@dgarske dgarske removed the request for review from wolfSSL-Bot January 21, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants