Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Arm64: Add S.P.CoreLib instrinsics for AddAcross #27663

Merged

Conversation

TamarChristinaArm
Copy link

This implements the AddAcross intrinsics for Arm64.

AddAcross is a sum reduction operation.

See dotnet/corefx#26581 and is split of #26769
CoreFX reference assembly update at dotnet/corefx#42359

/CC @tannergooding @CarolEidt @echesakovMSFT

Thanks,
Tamar

@tannergooding
Copy link
Member

There are some merge conflicts that need to be resolved here.

@TamarChristinaArm TamarChristinaArm force-pushed the implement-arm64-addacross branch from ed00c5a to 1bec747 Compare November 4, 2019 18:57
@TamarChristinaArm
Copy link
Author

Rebased :)

@TamarChristinaArm TamarChristinaArm force-pushed the implement-arm64-addacross branch 2 times, most recently from 6a6c1c8 to d9056bb Compare November 5, 2019 12:35
Copy link

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TamarChristinaArm Is there a floating-point version of the intrinsic?

@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

  1. In your coreclr repository clone, create patch by running git format-patch origin
  2. In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

@TamarChristinaArm TamarChristinaArm force-pushed the implement-arm64-addacross branch from d9056bb to a639bd6 Compare November 7, 2019 11:05
@TamarChristinaArm
Copy link
Author

@TamarChristinaArm Is there a floating-point version of the intrinsic?

Not for AdvSimd only SVE. In AdvSimd you have FADDP that'll do pairwise addition, which will work but will do extra additions and you need multiple of them for vectors of more than 2 numbers. So it's not equivalent and so is a different intrinsic.

@tannergooding
Copy link
Member

tannergooding commented Nov 7, 2019

@TamarChristinaArm, this does a sum of all elements where-as FADDP is a pairwise addition (equivalent to Horizontal Add on x86) and also has integer equivalents, correct?

/// int32_t vaddvq_s32(int32x4_t a)
/// A64: ADDV Sd, Vn.4S
/// </summary>
public static int AddAcross(Vector128<int> value) { throw new PlatformNotSupportedException(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no Vector64<int> overload because it is equivalent to a AddPairwise, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so the manual marks size = 10, Q = 0 and size = 11, Q = 1 as reserved; with size = 11, Q = 0 as reserved because it would be a nop).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, vector64<int> would be an ADDP, but I couldn't see anyway to encode that in the intrinsics table in hwintrinsiclistarm64.h as it assumes you have the same instruction for all sizes of the same type unless I'm mistaken? It doesn't seem to allow overloading on the maximum vector size if the name stays the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to allow overloading on the maximum vector size if the name stays the same.

Right, you would need to have special-handling in codegen to ensure the right instruction was selected.

Given that the underlying instruction encoding on ADDV is blocked and users have the same functionality via the explicit AddPairwise, I think not exposing it is the right decision.

/// uint32_t vaddvq_u32(uint32x4_t a)
/// A64: ADDV Sd, Vn.4S
/// </summary>
public static uint AddAcross(Vector128<uint> value) => AddAcross(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior for overflow, if any?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is none, in this case you wouldn't know about it, and this instruction doesn't have a saturated version either.

@TamarChristinaArm
Copy link
Author

@TamarChristinaArm, this does a sum of all elements where-as FADDP is a pairwise addition (equivalent to Horizontal Add on x86) and also has integer equivalents, correct?

Yup that's correct.

@tannergooding
Copy link
Member

This can be merged after @echesakovMSFT or @CarolEidt also sign-off.

Copy link

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@echesakov echesakov merged commit 12b9ed4 into dotnet:master Nov 7, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Nov 7, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request Nov 7, 2019
Anipik pushed a commit to dotnet/corefx that referenced this pull request Nov 7, 2019
marek-safar pushed a commit to mono/mono that referenced this pull request Nov 7, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Nov 8, 2019
@TamarChristinaArm TamarChristinaArm deleted the implement-arm64-addacross branch November 8, 2019 10:38
jkotas pushed a commit to dotnet/corert that referenced this pull request Nov 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants