[AArch64] NEON, SVE2 and SME2 instruction support with tests #439

FinnWilkinson · 2024-11-04T18:11:15Z

This PR adds a wide range of different NEON, SVE2, SME2 instructions with regressions tests. These facilitate a subset of some internal SME-based GEMM and GEMV codes.

There is some BF16 prototypical instruction support which by default is disabled (using a new build option and an if statement in each appropriate switch statement case) due to some usage of __bf16 which is not compiler agnostic, some hacky usage of memcpy to re-interpret uint16_t, and a lack of regression tests for the BF16 instructions in question.

These BF16 instructions can be enabled through a new CMake option -DSIMENG_ENABLE_BF16=ON. I have deliberately not included this in the documentation given the possible instibility of the BF16 implementation and to keep it for (mainly) internal usage only.

This branch is based on sme2-support (PR #429 ) and so should be merged after this brnch has been merged into dev.

Some SM2 instructions which use multi-vector operands can be non-trivial to read or understand. Please ask for clarification and suggest any additional comments that may help future understanding.

ABenC377

Only a few comments

CMakeLists.txt

src/include/simeng/arch/aarch64/Instruction.hh

src/include/simeng/arch/aarch64/helpers/sve.hh

src/include/simeng/arch/aarch64/helpers/neon.hh

src/include/simeng/arch/aarch64/operandContainer.hh

jj16791 · 2024-12-14T11:24:47Z

src/lib/arch/aarch64/Instruction_decode.cc

@@ -548,7 +549,7 @@ void Instruction::decode() {
      } else if (metadata_.operands[0].is_vreg) {
        setInstructionType(InsnType::isVectorData);
      } else if ((metadata_.operands[0].reg >= AARCH64_REG_ZAB0 &&
-                  metadata_.operands[0].reg <= AARCH64_REG_ZT0) ||
+                  metadata_.operands[0].reg < AARCH64_REG_ZT0) ||


Can ZT0 be used in a SVE context?

ZT0 is enabled / disabled in the same way as Z0 but has a fixed width of 512-bits, with the logic for detecting whether a ZT0 related instruction can / can't be executed done in instruction_execute as with all other SME instructions.

Regarding where in a core/implementation ZT0 based instructions are executed, there is no fixed rule in the spec as far as I can tell.... Given its fixed width, to me it seems more SVE-like than SME hence the grouping seen here. And given we don't have co-processor SME support, theres no offload / seperate chip logic to come into play yet

jj16791 · 2024-12-14T11:27:30Z

src/lib/arch/aarch64/Instruction_execute.cc

+                                            // zm.h
+        // SME
+        // BF16 -- EXPERIMENTAL
+        if (std::string(SIMENG_ENABLE_BF16) == "OFF") return executionNYI();


I think this would be better implemented through a preprocessor directive for the entire case

jj16791 · 2024-12-14T11:28:38Z

test/regression/aarch64/AArch64RegressionTest.hh

@@ -190,6 +190,24 @@ inline std::vector<std::tuple<CoreType, std::string>> genCoreTypeSVLPairs(
    checkMatrixRegisterCol<type>(tag, index, __VA_ARGS__); \
  }

+/** Check each element of the Lookup Table register ZT0 against expected values.
+ *
+ * The `tag` argument is the register index (must be 0), and the `type` argument


If tag must always be 0, then why not hardcode it as such?

jj16791 · 2024-12-14T11:36:37Z

test/regression/aarch64/Exception.cc

@@ -151,7 +151,6 @@ TEST_P(Exception, unmapped_sys_reg) {
  EXPECT_EQ(stdout_.substr(0, strlen(err)), err);
 }

-#if SIMENG_LLVM_VERSION >= 14


Is this in response to the SVE vs SVE2 identification issue or something else? I feel like we can keep this sort of checking in

Yes - in response to the SVE / SVE2 and SME / SME2 checks (i.e. it is not trivial). This one can stay though, yes

jj16791 · 2024-12-14T11:38:11Z

src/lib/arch/aarch64/Instruction_execute.cc

@@ -486,6 +520,66 @@ void Instruction::execute() {
        branchAddress_ = instructionAddress_ + metadata_.operands[0].imm;
        break;
      }
+      case Opcode::AArch64_BF16DOTlanev8bf16: {  // bfdot vd.4s, vn.8h,


Do the bf16 instructions have test cases?

No, I've kept them as experimental implementations and undocumented, including how to enable them. You would need to look through the sourcecode and CMake files to know it is there

jj16791 · 2024-12-14T11:38:39Z

src/lib/arch/aarch64/Instruction_execute.cc

@@ -486,6 +520,66 @@ void Instruction::execute() {
        branchAddress_ = instructionAddress_ + metadata_.operands[0].imm;
        break;
      }
+      case Opcode::AArch64_BF16DOTlanev8bf16: {  // bfdot vd.4s, vn.8h,


Do the bf16 instructions have test cases?

No, I've kept them as experimental implementations and undocumented, including how to enable them. You would need to look through the sourcecode and CMake files to know it is there

jj16791 · 2024-12-14T11:40:38Z

src/include/simeng/arch/aarch64/Instruction.hh

@@ -283,6 +283,43 @@ enum class InsnType : uint32_t {
  isBranch = 1 << 14
 };

+/** Predefined shift values for converting pred-as-counter to pred-as-mask. */
+const uint64_t predCountShiftVals[9] = {0, 1, 2, 0, 3, 0, 0, 0, 4};


Unless I've missed something, this is used in one location. Why is the data defined as a variable outside of all function scopes?

Removed as can calculate it automatically

jj16791 · 2024-12-14T11:41:55Z

CMakeLists.txt

-    endif()
-    if (${LLVM_PACKAGE_VERSION} VERSION_LESS "18.0")
-      message(STATUS "LLVM version does not support AArch64 extensions SME2. These test suites will be skipped.")
+      message(STATUS "LLVM version does not support AArch64 extensions SVE2, SVE2.1, SME, or SME2. Related tests will fail.")


Why can't we place preprocessor directives around the SME tests? I though it was just a SVE vs SVE2 problem?

There is a similar problem with SME and SME2

The base branch was changed.

…sts.

…h tests.

…ged address generation logic for ST2W and ST4W.

…on with tests.

…tests.

FinnWilkinson added the enhancement New feature or request label Nov 4, 2024

FinnWilkinson requested review from dANW34V3R, jj16791, JosephMoore25 and ABenC377 November 4, 2024 18:11

FinnWilkinson self-assigned this Nov 4, 2024

FinnWilkinson changed the base branch from dev to sme2-support November 4, 2024 18:12

FinnWilkinson force-pushed the sme2-support branch from ec02455 to e7d34e1 Compare November 6, 2024 16:45

FinnWilkinson force-pushed the sme-loops-support branch 2 times, most recently from f9a759f to f2b86fa Compare November 7, 2024 19:58

FinnWilkinson force-pushed the sme-loops-support branch from f2b86fa to 796b99e Compare November 14, 2024 10:23

ABenC377 reviewed Dec 6, 2024

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

src/include/simeng/arch/aarch64/Instruction.hh Outdated Show resolved Hide resolved

src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved

FinnWilkinson force-pushed the sme-loops-support branch from 796b99e to 5ff6446 Compare December 13, 2024 16:00

jj16791 requested changes Dec 14, 2024

View reviewed changes

ABenC377 approved these changes Dec 17, 2024

View reviewed changes

ABenC377 previously approved these changes Dec 17, 2024

View reviewed changes

FinnWilkinson force-pushed the sme2-support branch from bc91dcd to fc308db Compare December 17, 2024 17:47

FinnWilkinson force-pushed the sme-loops-support branch 2 times, most recently from 393dd26 to b027f73 Compare December 18, 2024 15:07

FinnWilkinson changed the base branch from sme2-support to dev December 20, 2024 10:01

FinnWilkinson added 9 commits December 20, 2024 10:05

Fixed execution logic for UMINP and UMAXP neon instructions.

51ade58

Implemented ldrsb (32-bit, Post) instruction with test.

6a11d7d

Fixed implementation of NEON CMHS instruction.

520324c

Implemented UCVTF (fixed-point to float) instruction with test.

2b4a886

Implemented UCVTF (fixed-point to float) helper function.

e43ada7

Implemented UDOT (by element) NEON instructions with tests.

4773af8

Implemented LD1 (NEON 8h x2, post index) instruction with tests.

50a8a20

Implemented NEON UMLAL (32 to 64 bit) instruction with tests.

6696d5f

Implemented NEON UMLAL2 (32 to 64 bit) instruction with tests.

bb5096a

FinnWilkinson added 26 commits December 20, 2024 10:05

Implemented ST4W (imm offset) SVE instruction with tests.

68038b7

Implemented LD1W (4 vec, scalar offset) SVE2 instruction with tests.

4a8f3f6

Implemented FMLA (float, VGx4) SME instruction with tests.

3d5b288

Implemented MOVA (array to vecs, 2 registers) SME instruction with te…

b9dcabe

…sts.

Implemented FADD (float, vgx2) SME instruction with tests.

b988e01

Implemented LD1D (4 vec, scalar offset) SVE2 instruction with tests.

4f75ffe

Implemented FMLA (double, VGx4) SME instruction with tests.

f35472b

Implemented FADD (double, vgx2) SME instruction with tests.

1bf3306

Implemented LD1H (Single vec, imm offset) SVE instruction with tests.

4effde4

Added SVE bf16 DOT (indexed) instruction execution logic.

40bba12

Implemented LD1H (two vec, imm and scalar offset) SVE instruction wit…

3932360

…h tests.

Implemented BFMOPA (widening) SME instruction.

5aad523

Minor UMAXP fix.

430c775

Fixed function comment.

a01c2fc

Updated BF16 comment.

9790c6e

Implemented NEON UDOT (by vector) instruction with tests.

5bc9330

Implemented SVE UDOT (by vector, 4-way) instruction with tests.

1fd130c

Implemented SVE ST4W (scalar offset) instruction with tests, and chan…

81ddba7

…ged address generation logic for ST2W and ST4W.

Implemented LD1B (4 vec, scalar offset) SVE2 instruction with tests.

4c99a0f

Implemented UDOT (4-way, VGx4 8-bit to 32-bit widening) SME instructi…

0d74234

…on with tests.

Implemented ADD (uint32, vgx2, vectors and ZA), SME instruction with …

40a0fa4

…tests.

Implemented ZIP (4 vectors) SVE2 instruction with tests.

950de41

Attended PR comments.

03a95e7

Minor bug fixes.

6729363

Attended PR comments.

850b741

Updated multi-vector load logic.

1d04096

FinnWilkinson force-pushed the sme-loops-support branch from b027f73 to 1d04096 Compare December 20, 2024 10:08

FinnWilkinson added 2 commits December 20, 2024 11:06

CI CD fixes.

246d39a

CI CD fixes pt2.

0ec0b8d

ABenC377 approved these changes Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] NEON, SVE2 and SME2 instruction support with tests #439

[AArch64] NEON, SVE2 and SME2 instruction support with tests #439

FinnWilkinson commented Nov 4, 2024

ABenC377 left a comment

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024

jj16791 Dec 14, 2024

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024 •

edited

Loading

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024

jj16791 Dec 14, 2024

FinnWilkinson Dec 16, 2024

[AArch64] NEON, SVE2 and SME2 instruction support with tests #439

Are you sure you want to change the base?

[AArch64] NEON, SVE2 and SME2 instruction support with tests #439

Conversation

FinnWilkinson commented Nov 4, 2024

ABenC377 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FinnWilkinson Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FinnWilkinson Dec 16, 2024 •

edited

Loading