Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 ACLE specification #323

Merged
merged 2 commits into from
Sep 23, 2024
Merged

Conversation

momchil-velikov
Copy link
Contributor


name: Pull request
about: Technical issues, document format problems, bugs in scripts or feature proposal.


Thank you for submitting a pull request!

If this PR is about a bugfix:

Please use the bugfix label and make sure to go through the checklist below.

If this PR is about a proposal:

We are looking forward to evaluate your proposal, and if possible to
make it part of the Arm C Language Extension (ACLE) specifications.

We would like to encourage you reading through the contribution
guidelines
, in particular the section on submitting
a proposal
.

Please use the proposal label.

As for any pull request, please make sure to go through the below
checklist.

Checklist: (mark with X those which apply)

  • If an issue reporting the bug exists, I have mentioned it in the
    PR (do not bother creating the issue if all you want to do is
    fixing the bug yourself).
  • I have added/updated the SPDX-FileCopyrightText lines on top
    of any file I have edited. Format is SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
    (Please update existing copyright lines if applicable. You can
    specify year ranges with hyphen , as in 2017-2019, and use
    commas to separate gaps, as in 2018-2020, 2022).
  • I have updated the Copyright section of the sources of the
    specification I have edited (this will show up in the text
    rendered in the PDF and other output format supported). The
    format is the same described in the previous item.
  • I have run the CI scripts (if applicable, as they might be
    tricky to set up on non-*nix machines). The sequence can be
    found in the contribution
    guidelines
    . Don't
    worry if you cannot run these scripts on your machine, your
    patch will be automatically checked in the Actions of the pull
    request.
  • I have added an item that describes the changes I have
    introduced in this PR in the section Changes for next
    release
    of the section Change Control/Document history
    of the document. Create Changes for next release if it does
    not exist. Notice that changes that are not modifying the
    content and rendering of the specifications (both HTML and PDF)
    do not need to be listed.
  • When modifying content and/or its rendering, I have checked the
    correctness of the result in the PDF output (please refer to the
    instructions on how to build the PDFs
    locally
    ).
  • The variable draftversion is set to true in the YAML header
    of the sources of the specifications I have modified.
  • Please DO NOT add my GitHub profile to the list of contributors
    in the README page of the project.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
@andrewcarlotti
Copy link
Contributor

I'd prefer slightly different naming for the intrinsics and new types. Specifically:

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.? This would be more consistent with existing type names while still preserving the "modal" distinction. It also makes the type name more easily distinguishable from FPMR values (which use fpm_t).

  2. Can we drop the _fpm suffix from all the intrinsic names, and instead represent the modality in the type suffix (by replacing _f8 with _fm8 wherever it appears)?

Combining these, my proposal is to replace, for example,
float16x4_t vdot_lane_f16_f8_fpm(float16x4_t vd, fpm8x8_t vn, fpm8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
with
float16x4_t vdot_lane_f16_fm8(float16x4_t vd, floatm8x8_t vn, floatm8x8_t vm, __builtin_constant_p(lane), fpm_t fpm).

@momchil-velikov
Copy link
Contributor Author

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.?
  2. Can we drop the _fpm suffix from all the intrinsic names, and instead represent the modality in the type suffix (by replacing _f8 with _fm8 wherever it appears)?

I am in favour of both proposals.

@rsandifo-arm
Copy link
Contributor

This might have been mentioned already, but the new vector types should also be added to svset_neonq, svget_neonq and svdup_neonq.

CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 1, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__fpm8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the fpm8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

[1]  ARM-software/acle#323
@momchil-velikov
Copy link
Contributor Author

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.?
  2. Can we drop the _fpm suffix from all the intrinsic names, and instead represent the modality in the type suffix (by replacing _f8 with _fm8 wherever it appears)?

I am in favour of both proposals.

Coming up Soon(tm).

@momchil-velikov
Copy link
Contributor Author

This might have been mentioned already, but the new vector types should also be added to svset_neonq, svget_neonq and svdup_neonq.

My next step is to add intrinsics for the untyped SVE/SME instructions, that would include these too.

CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 22, 2024
This patch adds these new vector sizes for neon:
fpm8x16_t and fpm8x8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
@momchil-velikov
Copy link
Contributor Author

I'd prefer slightly different naming for the intrinsics and new types. Specifically:

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.? This would be more consistent with existing type names while still preserving the "modal" distinction. It also makes the type name more easily distinguishable from FPMR values (which use fpm_t).

This part done.

main/acle.md Outdated Show resolved Hide resolved
@paulwalker-arm
Copy link

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.? This would be more consistent with existing type names while still preserving the "modal" distinction. It also makes the type name more easily distinguishable from FPMR values (which use fpm_t).

As an amendment that follows the scheme used when going from float16 -> bfloat16 what about mfloat8_t, mfloat8x16_t, svmfloat8_t with "m" meaning "modal"?

@momchil-velikov
Copy link
Contributor Author

momchil-velikov commented Jul 30, 2024

Things renamed according to the above naming scheme.

Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new feature macros should also be listed in the “Summary of predefined macros” section.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
 `mfloat8_t`     | equivalent to `__mfp8` |

According to ACLE[1] proposal
[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
 `mfloat8_t`     | equivalent to `__mfp8` |

According to ACLE[1] proposal
[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
and fpm8x16_t to mfloat8x16_t

According to the ACLE[1]

[1]ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Aug 2, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
 `mfloat8_t`     | equivalent to `__mfp8` |
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the fpm8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

According to ACLE[1] proposal
[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Aug 2, 2024
This patch adds these new vector sizes for neon:
mfloat8x16_t and mfloat8x8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Aug 2, 2024
This patch adds these new vector sizes for sve:
svmfloat8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
bricknerb pushed a commit to bricknerb/llvm-project that referenced this pull request Oct 17, 2024
This patch adds these new vector sizes for sve:
    svmfloat8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Oct 22, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the MFloat8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

[1]  ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Oct 22, 2024
This patch adds these new vector sizes for neon:
mfloat8x16_t and mfloat8x8_t

According to the ARM ACLE PR#323[1].

 `mfloat8_t`     | equivalent to `__mfp8` |

[1]ARM-software/acle#323
EricWF pushed a commit to efcs/llvm-project that referenced this pull request Oct 22, 2024
This patch adds these new vector sizes for sve:
    svmfloat8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 23, 2024
This patch adds these new vector sizes for neon:
   mfloat8x16_t and mfloat8x8_t

    According to the ARM ACLE PR#323[1].

    [1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Oct 23, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the MFloat8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

[1]  ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Oct 24, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the MFloat8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

[1]  ARM-software/acle#323
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 25, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.

This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.

[1]  ARM-software/acle#323
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This patch adds these new vector sizes for neon:
   mfloat8x16_t and mfloat8x8_t

    According to the ARM ACLE PR#323[1].

    [1] ARM-software/acle#323
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…#97277)

ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.

This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.

[1]  ARM-software/acle#323
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Nov 28, 2024
…116959)

This patch implements the following intrinsics:

8-bit floating-point convert to deinterleaved half-precision or
BFloat16.
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

Defined in ARM-software/acle#323

Co-authored-by: Caroline Concatto [email protected]
Co-authored-by: Marian Lukac [email protected]
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Dec 8, 2024
…CVT (#118027)

This patch implements the following intrinsics:

8-bit floating-point convert to half-precision or BFloat16 (in-order).
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

In accordance with ARM-software/acle#323.

Co-authored-by: Marin Lukac [email protected]
Co-authored-by: Caroline Concatto [email protected]
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Dec 9, 2024
This patch implements the following intrinsics:

8-bit floating-point sum of outer products and accumulate.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
    void svmopa_za16[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");

  // Only if __ARM_FEATURE_SME_F8F32 != 0
    void svmopa_za32[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");
```

In accordance with: ARM-software/acle#323

Co-authored-by: Momchil Velikov [email protected]
Co-authored-by: Marian Lukac [email protected]
broxigarchen pushed a commit to broxigarchen/llvm-project that referenced this pull request Dec 10, 2024
…CVT (llvm#118027)

This patch implements the following intrinsics:

8-bit floating-point convert to half-precision or BFloat16 (in-order).
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

In accordance with ARM-software/acle#323.

Co-authored-by: Marin Lukac [email protected]
Co-authored-by: Caroline Concatto [email protected]
broxigarchen pushed a commit to broxigarchen/llvm-project that referenced this pull request Dec 10, 2024
This patch implements the following intrinsics:

8-bit floating-point sum of outer products and accumulate.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
    void svmopa_za16[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");

  // Only if __ARM_FEATURE_SME_F8F32 != 0
    void svmopa_za32[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");
```

In accordance with: ARM-software/acle#323

Co-authored-by: Momchil Velikov [email protected]
Co-authored-by: Marian Lukac [email protected]
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Dec 11, 2024
This patch implements the following intrinsics:

Convert to packed 8-bit floating-point format.
``` c
  // Variants are also available for: _mf8[_bf16_x2] and _mf8[_f32_x4]
  svmfloat8_t svcvt_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm) __arm_streaming;
```
Convert to interleaved 8-bit floating-point format.
``` c
  svmfloat8_t svcvtn_mf8[_f32_x4]_fpm(svfloat32x4_t zn, fpm_t fpm) __arm_streaming;
```
In accordance with ARM-software/acle#323.

Co-authored-by: Marin Lukac [email protected]
Co-authored-by: Caroline Concatto [email protected]
jthackray added a commit to llvm/llvm-project that referenced this pull request Dec 16, 2024
…ns (#119845)

Add support for the following SME 8 bit floating-point dot-product intrinsics:

```
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svdot[_single]_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot[_single]_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
void svdot[_single]_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot[_single]_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
```

These intrinsics are extracted from:
ARM-software/acle#323

Co-authored-by: Momchil Velikov <[email protected]>
Co-authored-by: Marian Lukac <[email protected]>
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Dec 16, 2024
…18549)

This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
  void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
  void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");
```
In accordance with: ARM-software/acle#323
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Dec 17, 2024
Multi-vector 8-bit floating-point multiply-add long (single)
```c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla[_single]_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla[_single]_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
 ```
 In accordance with ARM-software/acle#323.
 
Co-authored-by: Momchil Velikov [email protected]
SpencerAbson added a commit to llvm/llvm-project that referenced this pull request Dec 17, 2024
)

This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long (multiple vectors).

``` c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");                              
```

In accordance with ARM-software/acle#323
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.