Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Split FeatureAES to FEAT_AES and FEAT_PMULL. #110816

Closed
wants to merge 1 commit into from

Conversation

labrinea
Copy link
Collaborator

@labrinea labrinea commented Oct 2, 2024

Currently in LLVM FeatureAES models both FEAT_AES and FEAT_PMULL lumped together. Similarly FeatureSVE2AES means FEAT_SVE_AES plus FEAT_SVE_PMULL128. However the architecture does not mandate that both need to be implemented at the same time. Splitting them will allow Function Multiversioning to enable backend support for 'aes' and 'sve2-aes'. I have added an override for the user visible names of the new features to preserve the old semantics for backwards compatibility with command line, target attribute and assembler directives.

Currently in LLVM FeatureAES models both FEAT_AES and FEAT_PMULL lumped together.
Similarly FeatureSVE2AES means FEAT_SVE_AES plus FEAT_SVE_PMULL128. However the
architecture does not mandate that both need to be implemented at the same time.
Splitting them will allow Function Multiversioning to enable backend support for
'aes' and 'sve2-aes'. I have added an override for the user visible names of the
new features to preserve the old semantics for backwards compatibility with
command line, target attribute and assembler directives.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" mc Machine (object) code labels Oct 2, 2024
@labrinea
Copy link
Collaborator Author

labrinea commented Oct 2, 2024

Adding the GCC folks @andrewcarlotti and @Wilco1 for visibility.

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-aarch64

Author: Alexandros Lamprineas (labrinea)

Changes

Currently in LLVM FeatureAES models both FEAT_AES and FEAT_PMULL lumped together. Similarly FeatureSVE2AES means FEAT_SVE_AES plus FEAT_SVE_PMULL128. However the architecture does not mandate that both need to be implemented at the same time. Splitting them will allow Function Multiversioning to enable backend support for 'aes' and 'sve2-aes'. I have added an override for the user visible names of the new features to preserve the old semantics for backwards compatibility with command line, target attribute and assembler directives.


Patch is 203.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110816.diff

89 Files Affected:

  • (modified) clang/lib/Basic/Targets/AArch64.cpp (+2-2)
  • (modified) clang/test/CodeGen/aarch64-fmv-dependencies.c (+7-6)
  • (modified) clang/test/CodeGen/aarch64-targetattr.c (+6-6)
  • (modified) clang/test/CodeGen/arm64_crypto.c (+1-1)
  • (modified) clang/test/CodeGen/attr-target-clones-aarch64.c (+4-4)
  • (modified) clang/test/CodeGen/attr-target-version.c (+56-54)
  • (modified) clang/test/CodeGen/neon-crypto.c (+2-2)
  • (modified) clang/test/CodeGenCXX/attr-target-version.cpp (+1-1)
  • (added) clang/test/Driver/aarch64-aes.c (+9)
  • (added) clang/test/Driver/aarch64-sve2-aes.c (+9)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-a64fx.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-ampere1.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-ampere1a.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-ampere1b.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a10.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a11.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a12.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a13.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a14.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a15.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a16.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a17.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-a7.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-apple-m4.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-carmel.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a34.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a35.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a53.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a55.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a57.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a65.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a65ae.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a72.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a73.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a75.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a76.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a76ae.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a77.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a78.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a78ae.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a78c.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-x1.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-x1c.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-exynos-m3.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-exynos-m4.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-exynos-m5.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-falkor.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-kryo.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-neoverse-512tvb.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-neoverse-e1.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-neoverse-n1.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-neoverse-v1.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-oryon-1.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-saphira.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-thunderx.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-thunderx2t99.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-thunderx3t110.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-thunderxt81.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-thunderxt83.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-thunderxt88.c (+2-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-tsv110.c (+2-1)
  • (modified) clang/test/Driver/print-supported-extensions-aarch64.c (+2-2)
  • (modified) clang/test/Preprocessor/aarch64-target-features.c (+28-28)
  • (modified) llvm/lib/Target/AArch64/AArch64.td (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64FMV.td (+4-4)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+15-4)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+40-40)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+3-1)
  • (modified) llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp (+2-2)
  • (modified) llvm/lib/TargetParser/AArch64TargetParser.cpp (+11-2)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-pmull2.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/arm64-vmul.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/neon-vmull-high-p64.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/pmull-ldr-merge.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-intrinsics-polynomial-arithmetic-128.ll (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2/pmullb-128-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2/pmullb-128.s (+5-5)
  • (modified) llvm/test/MC/AArch64/SVE2/pmullt-128-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2/pmullt-128.s (+5-5)
  • (modified) llvm/test/MC/AArch64/arm64-diagno-predicate.s (+1-1)
  • (modified) llvm/test/MC/AArch64/directive-arch_extension-negative.s (+18-1)
  • (modified) llvm/test/tools/llvm-mca/AArch64/Cortex/A510-sve-instructions.s (+1-1)
  • (modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s (+1-1)
  • (modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-sve-instructions.s (+1-1)
  • (modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-sve-instructions.s (+1-1)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+6-2)
diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 5f5dfcb722f9d4..9b0f744ac543bc 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -853,7 +853,7 @@ bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
       HasSVE2 = true;
       HasSVE2p1 = true;
     }
-    if (Feature == "+sve2-aes") {
+    if (Feature == "+sve2-aes" || Feature == "+sve2-pmull128") {
       FPU |= NeonMode;
       FPU |= SveMode;
       HasFullFP16 = true;
@@ -963,7 +963,7 @@ bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
       HasCRC = true;
     if (Feature == "+rcpc")
       HasRCPC = true;
-    if (Feature == "+aes") {
+    if (Feature == "+aes" || Feature == "+pmull") {
       FPU |= NeonMode;
       HasAES = true;
     }
diff --git a/clang/test/CodeGen/aarch64-fmv-dependencies.c b/clang/test/CodeGen/aarch64-fmv-dependencies.c
index 681f7e82634fa8..12d7ed34eaeaa7 100644
--- a/clang/test/CodeGen/aarch64-fmv-dependencies.c
+++ b/clang/test/CodeGen/aarch64-fmv-dependencies.c
@@ -3,7 +3,7 @@
 
 // RUN: %clang --target=aarch64-linux-gnu --rtlib=compiler-rt -emit-llvm -S -o - %s | FileCheck %s
 
-// CHECK: define dso_local i32 @fmv._Maes() #[[ATTR0:[0-9]+]] {
+// CHECK: define dso_local i32 @fmv._Maes() #[[aes:[0-9]+]] {
 __attribute__((target_version("aes"))) int fmv(void) { return 0; }
 
 // CHECK: define dso_local i32 @fmv._Mbf16() #[[bf16_ebf16:[0-9]+]] {
@@ -156,13 +156,13 @@ __attribute__((target_version("sve-i8mm"))) int fmv(void) { return 0; }
 // CHECK: define dso_local i32 @fmv._Msve2() #[[sve2:[0-9]+]] {
 __attribute__((target_version("sve2"))) int fmv(void) { return 0; }
 
-// CHECK: define dso_local i32 @fmv._Msve2-aes() #[[sve2_aes_sve2_pmull128:[0-9]+]] {
+// CHECK: define dso_local i32 @fmv._Msve2-aes() #[[sve2_aes:[0-9]+]] {
 __attribute__((target_version("sve2-aes"))) int fmv(void) { return 0; }
 
 // CHECK: define dso_local i32 @fmv._Msve2-bitperm() #[[sve2_bitperm:[0-9]+]] {
 __attribute__((target_version("sve2-bitperm"))) int fmv(void) { return 0; }
 
-// CHECK: define dso_local i32 @fmv._Msve2-pmull128() #[[sve2_aes_sve2_pmull128:[0-9]+]] {
+// CHECK: define dso_local i32 @fmv._Msve2-pmull128() #[[sve2_pmull128:[0-9]+]] {
 __attribute__((target_version("sve2-pmull128"))) int fmv(void) { return 0; }
 
 // CHECK: define dso_local i32 @fmv._Msve2-sha3() #[[sve2_sha3:[0-9]+]] {
@@ -183,7 +183,7 @@ int caller() {
   return fmv();
 }
 
-// CHECK: attributes #[[ATTR0]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[aes]] = { {{.*}} "target-features"="+aes,+fp-armv8,+neon,+outline-atomics,+v8a"
 // CHECK: attributes #[[bf16_ebf16]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+neon,+outline-atomics,+v8a"
 // CHECK: attributes #[[bti]] = { {{.*}} "target-features"="+bti,+fp-armv8,+neon,+outline-atomics,+v8a"
 // CHECK: attributes #[[crc]] = { {{.*}} "target-features"="+crc,+fp-armv8,+neon,+outline-atomics,+v8a"
@@ -205,7 +205,7 @@ int caller() {
 // CHECK: attributes #[[lse]] = { {{.*}} "target-features"="+fp-armv8,+lse,+neon,+outline-atomics,+v8a"
 // CHECK: attributes #[[memtag2]] = { {{.*}} "target-features"="+fp-armv8,+mte,+neon,+outline-atomics,+v8a"
 // CHECK: attributes #[[mops]] = { {{.*}} "target-features"="+fp-armv8,+mops,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[pmull]] = { {{.*}} "target-features"="+aes,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[pmull]] = { {{.*}} "target-features"="+aes,+fp-armv8,+neon,+outline-atomics,+pmull,+v8a"
 // CHECK: attributes #[[predres]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+predres,+v8a"
 // CHECK: attributes #[[rcpc]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rcpc,+v8a"
 // CHECK: attributes #[[rcpc3]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rcpc,+rcpc3,+v8a"
@@ -224,8 +224,9 @@ int caller() {
 // CHECK: attributes #[[sve_bf16_ebf16]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
 // CHECK: attributes #[[sve_i8mm]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+i8mm,+neon,+outline-atomics,+sve,+v8a"
 // CHECK: attributes #[[sve2]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+v8a"
-// CHECK: attributes #[[sve2_aes_sve2_pmull128]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-aes,+v8a"
+// CHECK: attributes #[[sve2_aes]] = { {{.*}} "target-features"="+aes,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-aes,+v8a"
 // CHECK: attributes #[[sve2_bitperm]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-bitperm,+v8a"
+// CHECK: attributes #[[sve2_pmull128]] = { {{.*}} "target-features"="+aes,+fp-armv8,+fullfp16,+neon,+outline-atomics,+pmull,+sve,+sve2,+sve2-aes,+sve2-pmull128,+v8a"
 // CHECK: attributes #[[sve2_sha3]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-sha3,+v8a"
 // CHECK: attributes #[[sve2_sm4]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-sm4,+v8a"
 // CHECK: attributes #[[wfxt]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a,+wfxt"
diff --git a/clang/test/CodeGen/aarch64-targetattr.c b/clang/test/CodeGen/aarch64-targetattr.c
index 1bc78a6e1f8c0f..ce77c6145156b0 100644
--- a/clang/test/CodeGen/aarch64-targetattr.c
+++ b/clang/test/CodeGen/aarch64-targetattr.c
@@ -208,17 +208,17 @@ void applem4() {}
 // CHECK: attributes #[[ATTR5]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #[[ATTR6]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+ete,+fp-armv8,+neon,+trbe,+v8a" }
 // CHECK: attributes #[[ATTR7]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "tune-cpu"="generic" }
-// CHECK: attributes #[[ATTR8]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fullfp16,+lse,+neon,+perfmon,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+v8.1a,+v8.2a,+v8a" "tune-cpu"="cortex-a710" }
+// CHECK: attributes #[[ATTR8]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fullfp16,+lse,+neon,+perfmon,+pmull,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+v8.1a,+v8.2a,+v8a" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #[[ATTR9]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+fullfp16,+sve" "tune-cpu"="cortex-a710" }
-// CHECK: attributes #[[ATTR10]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+ccdp,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a" }
-// CHECK: attributes #[[ATTR11]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+ccdp,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a,-sve" }
+// CHECK: attributes #[[ATTR10]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+ccdp,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+pmull,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a" }
+// CHECK: attributes #[[ATTR11]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+ccdp,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+pmull,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a,-sve" }
 // CHECK: attributes #[[ATTR12]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+fullfp16,+sve" }
 // CHECK: attributes #[[ATTR13]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+fullfp16" }
-// CHECK: attributes #[[ATTR14]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
-// CHECK: attributes #[[ATTR15]] = { noinline nounwind optnone "branch-target-enforcement" "guarded-control-stack" "no-trapping-math"="true" "sign-return-address"="non-leaf" "sign-return-address-key"="a_key" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
+// CHECK: attributes #[[ATTR14]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+pmull,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
+// CHECK: attributes #[[ATTR15]] = { noinline nounwind optnone "branch-target-enforcement" "guarded-control-stack" "no-trapping-math"="true" "sign-return-address"="non-leaf" "sign-return-address-key"="a_key" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+pmull,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #[[ATTR16]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // CHECK: attributes #[[ATTR17]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.3a" }
-// CHECK: attributes #[[ATTR18]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m4" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fpac,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+sme,+sme-f64f64,+sme-i16i64,+sme2,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8.7a,+v8a,+wfxt" }
+// CHECK: attributes #[[ATTR18]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m4" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fpac,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+pmull,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+sme,+sme-f64f64,+sme-i16i64,+sme2,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8.7a,+v8a,+wfxt" }
 //.
 // CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
 // CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
diff --git a/clang/test/CodeGen/arm64_crypto.c b/clang/test/CodeGen/arm64_crypto.c
index da6597be85bc80..5e6d59c490294f 100644
--- a/clang/test/CodeGen/arm64_crypto.c
+++ b/clang/test/CodeGen/arm64_crypto.c
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -triple arm64-apple-ios7.0 -target-feature +neon -target-feature +aes -target-feature +sha2 -ffreestanding -Os -S -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple arm64-apple-ios7.0 -target-feature +neon -target-feature +aes -target-feature +pmull -target-feature +sha2 -ffreestanding -Os -S -o - %s | FileCheck %s
 
 // REQUIRES: aarch64-registered-target
 
diff --git a/clang/test/CodeGen/attr-target-clones-aarch64.c b/clang/test/CodeGen/attr-target-clones-aarch64.c
index 274e05de594b8e..ba3ffd5749ea68 100644
--- a/clang/test/CodeGen/attr-target-clones-aarch64.c
+++ b/clang/test/CodeGen/attr-target-clones-aarch64.c
@@ -824,7 +824,7 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK-MTE-BTI-NEXT:    ret ptr @ftc_inline3.default
 //
 //.
-// CHECK: attributes #[[ATTR0:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+lse,+neon" }
+// CHECK: attributes #[[ATTR0:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+aes,+fp-armv8,+lse,+neon" }
 // CHECK: attributes #[[ATTR1:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+fullfp16,+neon,+sve,+sve2" }
 // CHECK: attributes #[[ATTR2:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon,+sha2" }
 // CHECK: attributes #[[ATTR3:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+mte,+neon,+sha2" }
@@ -837,13 +837,13 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK: attributes #[[ATTR10:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+complxnum,+fp-armv8,+fullfp16,+neon,+sve,+sve2,+sve2-bitperm" }
 // CHECK: attributes #[[ATTR11:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon,+rand" }
 // CHECK: attributes #[[ATTR12:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+predres,+rcpc" }
-// CHECK: attributes #[[ATTR13:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+fullfp16,+neon,+sve,+sve2,+sve2-aes,+wfxt" }
+// CHECK: attributes #[[ATTR13:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+aes,+fp-armv8,+fullfp16,+neon,+sve,+sve2,+sve2-aes,+wfxt" }
 // CHECK: attributes #[[ATTR14:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+fullfp16,+neon,+sb,+sve" }
 //.
 // CHECK-NOFMV: attributes #[[ATTR0:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-fmv" }
 // CHECK-NOFMV: attributes #[[ATTR1:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-fmv" }
 //.
-// CHECK-MTE-BTI: attributes #[[ATTR0:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+lse,+mte,+neon" }
+// CHECK-MTE-BTI: attributes #[[ATTR0:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+aes,+bti,+fp-armv8,+lse,+mte,+neon" }
 // CHECK-MTE-BTI: attributes #[[ATTR1:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+fullfp16,+mte,+neon,+sve,+sve2" }
 // CHECK-MTE-BTI: attributes #[[ATTR2:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+mte,+neon,+sha2" }
 // CHECK-MTE-BTI: attributes #[[ATTR3:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+mte,+neon" }
@@ -853,7 +853,7 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
 // CHECK-MTE-BTI: attributes #[[ATTR7:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+complxnum,+fp-armv8,+fullfp16,+mte,+neon,+sve,+sve2,+sve2-bitperm" }
 // CHECK-MTE-BTI: attributes #[[ATTR8:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+mte,+neon,+rand" }
 // CHECK-MTE-BTI: attributes #[[ATTR9:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+mte,+predres,+rcpc" }
-// CHECK-MTE-BTI: attributes #[[ATTR10:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+fullfp16,+mte,+neon,+sve,+sve2,+sve2-aes,+wfxt" }
+// CHECK-MTE-BTI: attributes #[[ATTR10:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+aes,+bti,+fp-armv8,+fullfp16,+mte,+neon,+sve,+sve2,+sve2-aes,+wfxt" }
 // CHECK-MTE-BTI: attributes #[[ATTR11:[0-9]+]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bti,+fp-armv8,+fullfp16,+mte,+neon,+sb,+sve" }
 //.
 // CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
diff --git a/clang/test/CodeGen/attr-target-version.c b/clang/test/CodeGen/attr-target-version.c
index 228435a0494c3e..f3314bb5c32173 100644
--- a/clang/test/CodeGen/attr-target-version.c
+++ b/clang/test/CodeGen/attr-target-version.c
@@ -242,14 +242,14 @@ int caller(void) { return used_def_without_default_decl() + used_decl_without_de
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@fmv_two._Mfp
-// CHECK-SAME: () #[[ATTR5]] {
+// CHECK-SAME: () #[[ATTR12:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 1
 //
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@fmv_two._Msimd
-// CHECK-SAME: () #[[ATTR5]] {
+// CHECK-SAME: () #[[ATTR12]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 2
 //
@@ -263,7 +263,7 @@ int caller(void) { return used_def_without_default_decl() + used_decl_without_de
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@fmv_two._Mfp16Msimd
-// CHECK-SAME: () #[[ATTR12:[0-9]+]] {
+// CHECK-SAME: () #[[ATTR13:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 4
 //
@@ -354,14 +354,14 @@ int caller(void) { return used_def_without_default_decl() + used_decl_without_de
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@unused_with_forward_default_decl._Mmops
-// CHECK-SAME: () #[[ATTR14:[0-9]+]] {
+// CHECK-SAME: () #[[ATTR15:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 0
 //
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@unused_with_implicit_extern_forward_default_decl._Mdotprod
-// CHECK-SAME: () #[[ATTR15:[0-9]+]] {
+// CHECK-SAME: () #[[ATTR16:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 0
 //
@@ -375,7 +375,7 @@ int caller(void) { return used_def_without_default_decl() + used_decl_without_de
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@unused_with_default_def._Msve
-// CHECK-SAME: () #[[ATTR16:[0-9]+]] {
+// CHECK-SAME: () #[[ATTR17:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 0
 //
@@ -389,7 +389,7 @@ int caller(void) { return used_def_without_default_decl() + used_decl_without_de
 //
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@unused_with_implicit_default_def._Mfp16
-// CHECK-SAME: () #[[ATTR12]] {
+// CHECK-SAM...
[truncated]

// CHECK: attributes #[[ATTR35]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon,+sm4" }
// CHECK: attributes #[[ATTR36]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+lse,+neon,+rdm" }
// CHECK: attributes #[[ATTR37:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon,+rdm" }
// CHECK: attributes #[[ATTR12]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes are autogenerated from update_cc_tests_check.py and are of no interest. Unfortunately I can't make the script not generate them so please ignore this part of the diff.

def : FMVExtension<"sve2-bitperm", "FEAT_SVE_BITPERM", "+sve2,+sve,+sve2-bitperm,+fullfp16,+fp-armv8,+neon", 400>;
def : FMVExtension<"sve2-pmull128", "FEAT_SVE_PMULL128", "+sve2,+sve,+sve2-aes,+fullfp16,+fp-armv8,+neon", 390>;
def : FMVExtension<"sve2-pmull128", "FEAT_SVE_PMULL128", "+pmull,+aes,+sve2,+sve,+sve2-pmull128,+sve2-aes,+fullfp16,+fp-armv8,+neon", 390>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hand writing these dependencies is so error prone. Once we review the remaining FMV features I am planning to autogenerate those from tablegen. For example the two lines below are already wrong. sve2-sha3 should enable +sha3 and sve2-sm4 should enable +sm4.

@andrewcarlotti
Copy link

However the architecture does not mandate that both need to be implemented at the same time.

This premise is incorrect. For FEAT_SVE_AES and FEAT_SVE_PMULL128, the latest version of the Arm ARM (DDI 0487K.a) includes the following in the definition of ID_AA64ZFR0_EL1.AES:

FEAT_SVE_AES implements the functionality identified by the value 0b0001.
FEAT_SVE_PMULL128 implements the functionality identified by the value 0b0010.
The permitted values are 0b0000 and 0b0010.

So it isn't permitted to implement just one of FEAT_SVE_AES and FEAT_SVE_PMULL128.

Similarly, for FEAT_AES and FEAT_PMULL, the previous version of the Arm ARM (DDI 0487J.a) includes the following in the definition of ID_AA64ISAR0_EL1.AES:

FEAT_AES implements the functionality identified by the value 0b0001.
FEAT_PMULL implements the functionality identified by the value 0b0010.
From Armv8, the permitted values are 0b0000 and 0b0010.

This last line was deleted in the latest Arm ARM (DDI 0487K.a), but it appears to be a mistake that was not intended to relax the architecture constraints. I've reported this discrepancy internally.

labrinea added a commit to labrinea/acle that referenced this pull request Oct 3, 2024
I originally tried splitting these features (see relevant pull reguest
llvm/llvm-project#110816), but the following came
to my attention:

According to https://developer.arm.com/documentation/ddi0487/latest
Arm Architecture Reference Manual for A-profile architecture:

D23.2.83 ID_AA64ZFR0_EL1, SVE Feature ID Register 0

ID_AA64ZFR0_EL1.AES, bits [7:4]

> FEAT_SVE_AES implements the functionality identified by the value 0b0001.
> FEAT_SVE_PMULL128 implements the functionality identified by the value 0b0010.
> The permitted values are 0b0000 and 0b0010.

Andrew Carlotti suggests that the same applies for ID_AA64ISAR0_EL1.AES
(llvm/llvm-project#110816 (comment))

D19.2.61 ID_AA64ISAR0_EL1, AArch64 Instruction Set Attribute Register 0

ID_AA64ISAR0_EL1.AES, bits [7:4]

> FEAT_AES implements the functionality identified by the value 0b0001.
> FEAT_PMULL implements the functionality identified by the value 0b0010.
> From Armv8, the permitted values are 0b0000 and 0b0010.

This was removed from the latest release of the Arm Architecture Reference Manual,
but it appears to be a mistake that was not intended to relax the architecture
constraints. The discrepancy has been reported.
@labrinea labrinea closed this Oct 23, 2024
labrinea added a commit to labrinea/acle that referenced this pull request Oct 25, 2024
I originally tried splitting these features (see relevant pull reguest
llvm/llvm-project#110816), but the following came
to my attention:

According to https://developer.arm.com/documentation/ddi0487/latest
Arm Architecture Reference Manual for A-profile architecture:

D23.2.83 ID_AA64ZFR0_EL1, SVE Feature ID Register 0

ID_AA64ZFR0_EL1.AES, bits [7:4]

> FEAT_SVE_AES implements the functionality identified by the value 0b0001.
> FEAT_SVE_PMULL128 implements the functionality identified by the value 0b0010.
> The permitted values are 0b0000 and 0b0010.

Andrew Carlotti suggests that the same applies for ID_AA64ISAR0_EL1.AES
(llvm/llvm-project#110816 (comment))

D19.2.61 ID_AA64ISAR0_EL1, AArch64 Instruction Set Attribute Register 0

ID_AA64ISAR0_EL1.AES, bits [7:4]

> FEAT_AES implements the functionality identified by the value 0b0001.
> FEAT_PMULL implements the functionality identified by the value 0b0010.
> From Armv8, the permitted values are 0b0000 and 0b0010.

This was removed from the latest release of the Arm Architecture Reference Manual,
but it appears to be a mistake that was not intended to relax the architecture
constraints. The discrepancy has been reported.
vhscampos pushed a commit to ARM-software/acle that referenced this pull request Oct 25, 2024
I originally tried splitting these features (see relevant pull request
llvm/llvm-project#110816), but the following
came to my attention:

According to https://developer.arm.com/documentation/ddi0487/latest Arm
Architecture Reference Manual for A-profile architecture:

D23.2.83 ID_AA64ZFR0_EL1, SVE Feature ID Register 0

ID_AA64ZFR0_EL1.AES, bits [7:4]

> FEAT_SVE_AES implements the functionality identified by the value
0b0001.
> FEAT_SVE_PMULL128 implements the functionality identified by the value
0b0010.
> The permitted values are 0b0000 and 0b0010.

Andrew Carlotti suggests that the same applies for ID_AA64ISAR0_EL1.AES
(llvm/llvm-project#110816 (comment))

D19.2.61 ID_AA64ISAR0_EL1, AArch64 Instruction Set Attribute Register 0

ID_AA64ISAR0_EL1.AES, bits [7:4]

> FEAT_AES implements the functionality identified by the value 0b0001.
> FEAT_PMULL implements the functionality identified by the value
0b0010.
> From Armv8, the permitted values are 0b0000 and 0b0010.

This was removed from the latest release of the Arm Architecture
Reference Manual, but it appears to be a mistake that was not intended
to relax the architecture constraints. The discrepancy has been
reported.
@labrinea labrinea deleted the split-aes-pmull branch October 30, 2024 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants