Handle case where falseValue is contained #101515

kunalspathak · 2024-04-24T19:39:27Z

I just realized that for Abs(a), when we wrap it in CndSel(mask, Abs(a), zero) , the falseValue is contained and we do not assign register to it. Fix it here. Not sure why this was not caught in the test run, perhaps I was using outdated binary or something.

tannergooding · 2024-04-24T19:54:55Z

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

@@ -439,7 +439,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
            {
                case 1:
                    assert(!instrIsRMW);
-                    if (targetReg != falseReg)
+                    if ((targetReg != falseReg) && (falseReg != REG_NA))


I don't think this is the right fix.

abs (predicated) only supports <Pg>/M (merging). We explicitly want to emit movprfx (predicated) using movprfx targetReg, maskReg/Z, targetReg when falseReg is REG_NA (and therefore a contained zero constant)

I also realized we may be missing some asserts...

movprfx in general has a lot of subtle requirements: https://developer.arm.com/documentation/ddi0602/2023-03/SVE-Instructions/MOVPRFX--predicated---Move-prefix--predicated--?lang=en

Although the operation of the instruction is defined as a simple predicated vector copy, it is required that the prefixed instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation with merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same predicate register, and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand), and the same destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in any other operand position, even if they have different names but refer to the same architectural register state. Any other use is UNPREDICTABLE.

So I think we probably want something in emitarm64.cpp that validates that movprfx doesn't violate this (we can't do peepholes if it would break these invariants, we can't differ predicate registers, we can't change element sizes, we can't change destination registers, we can't reuse targetReg for op1/op2/op3 in the subsequent instruction, etc).

CC. @TamarChristinaArm are there any other considerations we might want to cover in the assert? -- Also to double check, "The prefixed instruction must not use the destination register in any other operand position" applies to something like add (predicated) prefixed by movprfx, and not to movprfx (predicated) itself and not to add (predicated) if it is not preceded by movprfx, correct?

So add reg1, msk/M, reg1 is fine and movprfx reg1, msk/Z, reg1; add reg1, msk/M, reg2 is fine. It's only movprfx reg1, msk/Z, reg*; add reg1, msk/M, reg1 that is not allowed, correct?

The MOVPRFX list that @TamarChristinaArm pasted in #100743 (comment)

Yes, there is an item for that in #93095 and I have updated it to include these checkings.

We explicitly want to emit movprfx (predicated) using movprfx targetReg, maskReg/Z, targetReg when falseReg is REG_NA (and therefore a contained zero constant)

Not sure I understand why we need this. maskReg is AllTrueMask in this case, so aren't we just moving targetReg into targetReg if falseValue is a zero constant?

The MOVPRFX list that @TamarChristinaArm pasted in #100743 (comment)

👍, I think this list is missing the destination register cannot be used as an input register restriction the manual calls out, which is what I was questioning and trying to get a confirmation around.

Not sure I understand why we need this. maskReg is AllTrueMask in this case, so aren't we just moving targetReg into targetReg if falseValue is a zero constant?

In the case we know we have maskReg == AllTrueMask, there is no need to emit a movprfx ever, but this code path also looks to be hittable when the contents of maskReg are unknown, in which case we would need movprfx and still want to handle a contained zero. So simply checking for falseReg != REG_NA isn't sufficient since we want to execute this for cndsel(mask, abs(op1), zero) (where zero is contained and therefore falseReg == REG_NA)

yes, I was just looking at the scenario where we convert abs(op1) -> cndsel(AllTrueMask, abs(op1), zero) and for that we do not need movprfx. But yes, if user types in cndsel(mask, abs(op1), zero), I need to handle that case.

Fixed the cases. Their handling goes something like this:

; 1. op1 = cndsel(mask, abs(op1), op1) abs op1, mask/M, op1 ; 2. dst = cndsel(mask, abs(op1), op1) movprfx dst, op1 abs dst, mask/M, op1 ; 3. dst = cndsel(mask, abs(op1), op2) movprfx dst, op2 abs dst, mask/M, op1 ; 4. op1 = cndsel(mask, abs(op1), op2) abs op1, mask/M, op1 sel op1, mask/M, op1, op2 ; 5. op2 = cndsel(mask, abs(op1), op2) abs op2, mask/M, op1 ; 6. op1 = cndsel(mask, abs(op1), zero) movprfx op1, mask/Z, op1 abs op1, mask/M, op1 ; 7. dst = cndsel(mask, abs(op1), zero) movprfx dst, mask/Z, op1 abs op1, mask/M, op1 ; 8. dst = cndsel(allTrue, abs(op1), op2) abs dst, mask/M op1

src/coreclr/jit/gentree.h

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

kunalspathak · 2024-04-25T22:28:32Z

All tests are passing: https://gist.github.com/kunalspathak/73e0cfd469a70fd4b5466bada45df2eb

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

src/coreclr/jit/gentree.h

kunalspathak · 2024-04-26T20:54:35Z

superpmi-diff failure is unrelated.

* Handle case where falseValue is contained * Handle cases where Abs() is wrapped in conditional with AllBitsSet mask * Add a missing case for Abs() handling * jit format * Review comments * Review feedback * Another review feedback

Handle case where falseValue is contained

05dc3b1

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2024

kunalspathak requested a review from tannergooding April 24, 2024 19:39

dotnet-policy-service bot assigned kunalspathak Apr 24, 2024

tannergooding reviewed Apr 24, 2024

View reviewed changes

build-analysis bot mentioned this pull request Apr 24, 2024

iOS test failing with Keychain 'signing-certs.keychain-db' was not found (disabled device) #101502

Closed

kunalspathak added 3 commits April 24, 2024 18:56

Handle cases where Abs() is wrapped in conditional with AllBitsSet mask

fb5a4ef

Add a missing case for Abs() handling

cba3fc4

jit format

4f81a4d

build-analysis bot mentioned this pull request Apr 25, 2024

[ios-arm64] Keychain 'signing-certs.keychain-db' was not found #101551

Closed