-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle case where falseValue is contained #101515
Conversation
@@ -439,7 +439,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node) | |||
{ | |||
case 1: | |||
assert(!instrIsRMW); | |||
if (targetReg != falseReg) | |||
if ((targetReg != falseReg) && (falseReg != REG_NA)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right fix.
abs (predicated)
only supports <Pg>/M
(merging). We explicitly want to emit movprfx (predicated)
using movprfx targetReg, maskReg/Z, targetReg
when falseReg
is REG_NA
(and therefore a contained zero constant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also realized we may be missing some asserts...
movprfx
in general has a lot of subtle requirements: https://developer.arm.com/documentation/ddi0602/2023-03/SVE-Instructions/MOVPRFX--predicated---Move-prefix--predicated--?lang=en
Although the operation of the instruction is defined as a simple predicated vector copy, it is required that the prefixed instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation with merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same predicate register, and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand), and the same destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in any other operand position, even if they have different names but refer to the same architectural register state. Any other use is UNPREDICTABLE.
So I think we probably want something in emitarm64.cpp
that validates that movprfx
doesn't violate this (we can't do peepholes if it would break these invariants, we can't differ predicate registers, we can't change element sizes, we can't change destination registers, we can't reuse targetReg
for op1/op2/op3
in the subsequent instruction, etc).
CC. @TamarChristinaArm are there any other considerations we might want to cover in the assert? -- Also to double check, "The prefixed instruction must not use the destination register in any other operand position" applies to something like add (predicated)
prefixed by movprfx
, and not to movprfx (predicated)
itself and not to add (predicated)
if it is not preceded by movprfx
, correct?
So add reg1, msk/M, reg1
is fine and movprfx reg1, msk/Z, reg1; add reg1, msk/M, reg2
is fine. It's only movprfx reg1, msk/Z, reg*; add reg1, msk/M, reg1
that is not allowed, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MOVPRFX list that @TamarChristinaArm pasted in #100743 (comment)
Yes, there is an item for that in #93095 and I have updated it to include these checkings.
We explicitly want to emit
movprfx (predicated)
usingmovprfx targetReg, maskReg/Z, targetReg
whenfalseReg
isREG_NA
(and therefore a contained zero constant)
Not sure I understand why we need this. maskReg
is AllTrueMask in this case, so aren't we just moving targetReg
into targetReg
if falseValue
is a zero constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MOVPRFX list that @TamarChristinaArm pasted in #100743 (comment)
👍, I think this list is missing the destination register cannot be used as an input register restriction the manual calls out, which is what I was questioning and trying to get a confirmation around.
Not sure I understand why we need this. maskReg is AllTrueMask in this case, so aren't we just moving targetReg into targetReg if falseValue is a zero constant?
In the case we know we have maskReg == AllTrueMask
, there is no need to emit a movprfx
ever, but this code path also looks to be hittable when the contents of maskReg
are unknown, in which case we would need movprfx
and still want to handle a contained zero. So simply checking for falseReg != REG_NA
isn't sufficient since we want to execute this for cndsel(mask, abs(op1), zero)
(where zero
is contained and therefore falseReg == REG_NA
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was just looking at the scenario where we convert abs(op1)
-> cndsel(AllTrueMask, abs(op1), zero)
and for that we do not need movprfx
. But yes, if user types in cndsel(mask, abs(op1), zero)
, I need to handle that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the cases. Their handling goes something like this:
; 1. op1 = cndsel(mask, abs(op1), op1)
abs op1, mask/M, op1
; 2. dst = cndsel(mask, abs(op1), op1)
movprfx dst, op1
abs dst, mask/M, op1
; 3. dst = cndsel(mask, abs(op1), op2)
movprfx dst, op2
abs dst, mask/M, op1
; 4. op1 = cndsel(mask, abs(op1), op2)
abs op1, mask/M, op1
sel op1, mask/M, op1, op2
; 5. op2 = cndsel(mask, abs(op1), op2)
abs op2, mask/M, op1
; 6. op1 = cndsel(mask, abs(op1), zero)
movprfx op1, mask/Z, op1
abs op1, mask/M, op1
; 7. dst = cndsel(mask, abs(op1), zero)
movprfx dst, mask/Z, op1
abs op1, mask/M, op1
; 8. dst = cndsel(allTrue, abs(op1), op2)
abs dst, mask/M op1
All tests are passing: https://gist.github.com/kunalspathak/73e0cfd469a70fd4b5466bada45df2eb |
superpmi-diff failure is unrelated. |
* Handle case where falseValue is contained * Handle cases where Abs() is wrapped in conditional with AllBitsSet mask * Add a missing case for Abs() handling * jit format * Review comments * Review feedback * Another review feedback
* Handle case where falseValue is contained * Handle cases where Abs() is wrapped in conditional with AllBitsSet mask * Add a missing case for Abs() handling * jit format * Review comments * Review feedback * Another review feedback
I just realized that for
Abs(a)
, when we wrap it inCndSel(mask, Abs(a), zero)
, the falseValue is contained and we do not assign register to it. Fix it here. Not sure why this was not caught in the test run, perhaps I was using outdated binary or something.