Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: do no use .cfi_negate_ra_state within the assembly on Arm64 #15784

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

mcmilk
Copy link
Contributor

@mcmilk mcmilk commented Jan 16, 2024

Motivation and Context

Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

Closes: #14965

Description

How Has This Been Tested?

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@mcmilk
Copy link
Contributor Author

mcmilk commented Jan 16, 2024

@lundman - could you check, if this works also on Apple?

I have tested this only on userspace on an Apple Mac mini (M1, 2020):

[mcmilk@m1 ~/src/BLAKE3-tests]$ ./blake3_test 
implementation               1k      4k     16k     64k    256k      1m      4m
generic                     377     557     563     617     617     717     773
sse2                        384    1301    1488    1530    1564    1584    1583
sse41                       426    1484    1658    1711    1712    1715    1714

@lundman
Copy link
Contributor

lundman commented Jan 17, 2024

Ah can't apply directly of course, but if we are just testing that the opcodes are understood, a patch like

assembly test

--- a/module/icp/asm-aarch64/blake3/b3_aarch64_sse2.S
+++ b/module/icp/asm-aarch64/blake3/b3_aarch64_sse2.S
@@ -49,14 +49,14 @@
        .text
        .globl  zfs_blake3_compress_in_place_sse2
        .p2align        2
+
 #ifndef __APPLE__
        .type   zfs_blake3_compress_in_place_sse2,@function
 #endif
 zfs_blake3_compress_in_place_sse2:
        .cfi_startproc
-       hint    #25
-       .cfi_negate_ra_state
        sub     sp, sp, #96
+       .cfi_def_cfa_offset 96
        stp     x29, x30, [sp, #64]
        add     x29, sp, #64
        str     x19, [sp, #80]
@@ -76,11 +76,15 @@ zfs_blake3_compress_in_place_sse2:
        ldp     q2, q3, [sp, #32]
        eor     v0.16b, v2.16b, v0.16b
        eor     v1.16b, v3.16b, v1.16b
+       .cfi_def_cfa wsp, 96
        ldp     x29, x30, [sp, #64]
        stp     q0, q1, [x19]
        ldr     x19, [sp, #80]
        add     sp, sp, #96
-       hint    #29
+       .cfi_def_cfa_offset 0
+       .cfi_restore w19
+       .cfi_restore w30
+       .cfi_restore w29
        ret
 .Lfunc_end0:
 #ifndef __APPLE__
@@ -92,7 +96,7 @@ zfs_blake3_compress_in_place_sse2:
 #ifndef __APPLE__
        .section        .rodata.cst16,"aM",@progbits,16
 #endif
-       .p2align        4
+       .p2align        4, 0x0
 .LCPI1_0:

I can confirm that compiles and links ok. Obviously can't run it, so in terms of syntax it is all go. I would perhaps
have gone with the asm_linkage.h header, and let that set CFI_NEGATIVE_RA_STATE to nothing for gcc, or whatever the issue is?

Ultimately, I think you can proceed with Linux/FreeBSD changes. We've pretty much given up on merging macOS, and will plan to arrange our repo and workflow to be permanently down-stream, and have own copies of files for easier merging.

@zxombie
Copy link
Contributor

zxombie commented Jan 17, 2024

This will break FreeBSD 15.0 on hardware with BTI. As far as I can tell all you need to do is remove .cfi_negate_ra_state from the generated code, e.g. when targeting an older compiler.

@ziggythehamster
Copy link

As far as I can tell all you need to do is remove .cfi_negate_ra_state from the generated code, e.g. when targeting an older compiler.

I thought that there is no way that it's this simple, but it's a no-op on platforms without pointer authentication:

http://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c;hb=5e35a3a8bb63c946c1c198feb180c697b06649fa#l1413

Without knowing anything about how binutils is structured, it looks like it first does the same thing as .cfi_window_save:
http://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gas/gen-sframe.c;hb=5e35a3a8bb63c946c1c198feb180c697b06649fa#l1247 and http://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dwarf.c;hb=5e35a3a8bb63c946c1c198feb180c697b06649fa#l10087

So it seems to me like this could be the result of a simple ifdef, and on platforms without .cfi_negate_ra_state, it uses .cfi_window_save, and otherwise, it uses .cfi_negate_ra_state. (Strong assumption on my part that hint #foo is also available though.)

@mcmilk
Copy link
Contributor Author

mcmilk commented Jan 17, 2024

@ziggythehamster and @zxombie - Thanks for these findings - I will create a second and smaller approach
@lundman - Thanks a lot for the fast reply - and yes: the asm_linkage.h is really a need

Ultimately, I think you can proceed with Linux/FreeBSD changes. We've pretty much given up on merging macOS, and will plan to arrange our repo and workflow to be permanently down-stream, and have own copies of files for easier merging.

This sounds no so good :(

@ziggythehamster
Copy link

Another thought - any reason that SIMDE couldn't be integrated into an autoconf macro / codegen script so that everyone's assembly is decided at ./configure time? That would allow for greater optimization on more recent compilers, allow for platform-specific stuff (Apple Silicon PAC), and also open this up to architectures which currently don't get optimized versions. If this assembly is hand-optimized, I could see this being problematic, but AFAIK it isn't.

@mcmilk mcmilk force-pushed the unknown-pseudo-op branch from fa4c588 to 5d6625c Compare January 17, 2024 06:32
@mcmilk
Copy link
Contributor Author

mcmilk commented Jan 17, 2024

Another thought - any reason that SIMDE couldn't be integrated into an autoconf macro / codegen script so that everyone's assembly is decided at ./configure time? That would allow for greater optimization on more recent compilers, allow for platform-specific stuff (Apple Silicon PAC), and also open this up to architectures which currently don't get optimized versions. If this assembly is hand-optimized, I could see this being problematic, but AFAIK it isn't.

This would make the things a bit more complex I think.
The reason for choosing it this way was the huge difference between different compilers and versions... so we took the best and this assembly is used now.

Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue openzfs#14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Signed-off-by: Tino Reichardt <[email protected]>
Closes: openzfs#14965
@mcmilk mcmilk force-pushed the unknown-pseudo-op branch from 5d6625c to 4d89693 Compare January 17, 2024 07:18
@mcmilk mcmilk marked this pull request as draft January 17, 2024 07:51
@andrewc12 andrewc12 mentioned this pull request Jan 17, 2024
13 tasks
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 17, 2024
@ziggythehamster
Copy link

I've tested this on EC2 and the latest version of the patch seems to work just fine.

@mcmilk mcmilk marked this pull request as ready for review March 4, 2024 08:31
@q0rban
Copy link

q0rban commented Mar 7, 2024

This resolved my issue compiling on Apple M1 in Debian 10.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 15, 2024
@behlendorf behlendorf merged commit 90ba19e into openzfs:master Apr 15, 2024
24 of 25 checks passed
tonyhutter pushed a commit that referenced this pull request May 2, 2024
Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes #14965
Closes #15784
@mcmilk mcmilk deleted the unknown-pseudo-op branch August 24, 2024 13:37
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue openzfs#14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#14965
Closes openzfs#15784
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unknown pseudo-op
6 participants