Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case where falseValue is contained #101515

Merged
merged 8 commits into from
Apr 26, 2024
Merged

Conversation

kunalspathak
Copy link
Member

I just realized that for Abs(a), when we wrap it in CndSel(mask, Abs(a), zero) , the falseValue is contained and we do not assign register to it. Fix it here. Not sure why this was not caught in the test run, perhaps I was using outdated binary or something.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2024
@@ -439,7 +439,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
{
case 1:
assert(!instrIsRMW);
if (targetReg != falseReg)
if ((targetReg != falseReg) && (falseReg != REG_NA))
Copy link
Member

@tannergooding tannergooding Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right fix.

abs (predicated) only supports <Pg>/M (merging). We explicitly want to emit movprfx (predicated) using movprfx targetReg, maskReg/Z, targetReg when falseReg is REG_NA (and therefore a contained zero constant)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realized we may be missing some asserts...

movprfx in general has a lot of subtle requirements: https://developer.arm.com/documentation/ddi0602/2023-03/SVE-Instructions/MOVPRFX--predicated---Move-prefix--predicated--?lang=en

Although the operation of the instruction is defined as a simple predicated vector copy, it is required that the prefixed instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation with merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same predicate register, and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand), and the same destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in any other operand position, even if they have different names but refer to the same architectural register state. Any other use is UNPREDICTABLE.

So I think we probably want something in emitarm64.cpp that validates that movprfx doesn't violate this (we can't do peepholes if it would break these invariants, we can't differ predicate registers, we can't change element sizes, we can't change destination registers, we can't reuse targetReg for op1/op2/op3 in the subsequent instruction, etc).

CC. @TamarChristinaArm are there any other considerations we might want to cover in the assert? -- Also to double check, "The prefixed instruction must not use the destination register in any other operand position" applies to something like add (predicated) prefixed by movprfx, and not to movprfx (predicated) itself and not to add (predicated) if it is not preceded by movprfx, correct?

So add reg1, msk/M, reg1 is fine and movprfx reg1, msk/Z, reg1; add reg1, msk/M, reg2 is fine. It's only movprfx reg1, msk/Z, reg*; add reg1, msk/M, reg1 that is not allowed, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MOVPRFX list that @TamarChristinaArm pasted in #100743 (comment)

Yes, there is an item for that in #93095 and I have updated it to include these checkings.

We explicitly want to emit movprfx (predicated) using movprfx targetReg, maskReg/Z, targetReg when falseReg is REG_NA (and therefore a contained zero constant)

Not sure I understand why we need this. maskReg is AllTrueMask in this case, so aren't we just moving targetReg into targetReg if falseValue is a zero constant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MOVPRFX list that @TamarChristinaArm pasted in #100743 (comment)

👍, I think this list is missing the destination register cannot be used as an input register restriction the manual calls out, which is what I was questioning and trying to get a confirmation around.

Not sure I understand why we need this. maskReg is AllTrueMask in this case, so aren't we just moving targetReg into targetReg if falseValue is a zero constant?

In the case we know we have maskReg == AllTrueMask, there is no need to emit a movprfx ever, but this code path also looks to be hittable when the contents of maskReg are unknown, in which case we would need movprfx and still want to handle a contained zero. So simply checking for falseReg != REG_NA isn't sufficient since we want to execute this for cndsel(mask, abs(op1), zero) (where zero is contained and therefore falseReg == REG_NA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was just looking at the scenario where we convert abs(op1) -> cndsel(AllTrueMask, abs(op1), zero) and for that we do not need movprfx. But yes, if user types in cndsel(mask, abs(op1), zero), I need to handle that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the cases. Their handling goes something like this:

; 1. op1 = cndsel(mask, abs(op1), op1)
abs op1, mask/M, op1

; 2. dst = cndsel(mask, abs(op1), op1)
movprfx dst, op1
abs dst, mask/M, op1

; 3. dst = cndsel(mask, abs(op1), op2)
movprfx dst, op2
abs dst, mask/M, op1

; 4. op1 = cndsel(mask, abs(op1), op2)
abs op1, mask/M, op1
sel op1, mask/M, op1, op2

; 5. op2 = cndsel(mask, abs(op1), op2)
abs op2, mask/M, op1

; 6. op1 = cndsel(mask, abs(op1), zero)
movprfx op1, mask/Z, op1
abs op1, mask/M, op1

; 7. dst = cndsel(mask, abs(op1), zero)
movprfx dst, mask/Z, op1
abs op1, mask/M, op1

; 8. dst = cndsel(allTrue, abs(op1), op2)
abs dst, mask/M op1

@kunalspathak
Copy link
Member Author

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Apr 26, 2024
@kunalspathak
Copy link
Member Author

superpmi-diff failure is unrelated.

@kunalspathak kunalspathak merged commit a0beed4 into dotnet:main Apr 26, 2024
105 of 107 checks passed
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
* Handle case where falseValue is contained

* Handle cases where Abs() is wrapped in conditional with AllBitsSet mask

* Add a missing case for Abs() handling

* jit format

* Review comments

* Review feedback

* Another review feedback
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
* Handle case where falseValue is contained

* Handle cases where Abs() is wrapped in conditional with AllBitsSet mask

* Add a missing case for Abs() handling

* jit format

* Review comments

* Review feedback

* Another review feedback
@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants