-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: Peephole optimization for x < 0
#4625
x64: Peephole optimization for x < 0
#4625
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Two followup thoughts:
- I don't think the Wasm frontend uses
B1
at all, even for temporaries, though I could be wrong. If not, could we do the same for whatever ops it generates for the Wasm compare opcodes that produce ani32
0/1 flag? - I think we can get the opposite case (
x >= 0
) as well with a negation; that might be worth doing?
(If you want to merge as-is that's fine too; just writing down some ideas that came to mind!) |
@@ -99,9 +99,108 @@ block0(v0: i32): | |||
; pushq %rbp | |||
; movq %rsp, %rbp | |||
; block0: | |||
; shrl $31, %edi, %edi | |||
; sarl $31, %edi, %edi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why it uses signed right shift now? It should be an unsigned shift. Or -1 / 0 is equivalent to true / false. Sorry for the possibly simple questions. I'm not familiar with the generally accepted rules of codegen specifically for cranelift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was with an eye towards better supporting our boolean representations, as mentioned in #3205, as sar
will do sign extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's not clear to me that we can use the wider boolean types with icmp
, I've switched back to shr
with a type guard on the result. Sorry about the back and forth here!
; sarl $31, %edi, %edi | ||
; notl %edi, %edi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct? I guess it should be:
notl %edi, %edi
shrl $31, %edi, %edi
https://godbolt.org/z/s1fccc5Ex
However if codegen uses -1
/ 0
instead 1
/ 0
for true
/ false
. It should be ok.
Also I'm wondering how ordering not / shr may affect on micro/macro/instruction fusion on x64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've inverted it and restricted this optimization to boolean-producing instructions. Also this shouldn't affect the test/jmp
fusion, as this optimization will only fire for icmp
instructions whose results are used directly, not as the input to a branch decision. Was there another fusion case you were thinking this might invalidate?
;; Peephole optimization for `0 > x`, when x is a signed 64 bit value | ||
(rule (lower (icmp (IntCC.SignedGreaterThan) (u64_from_iconst 0) x @ (value_type $I64))) | ||
(x64_sar $I64 x (Imm8Reg.Imm8 63))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't ISLE and cranelift ir always canonicalize constants on the right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently, hence the symmetric cases for this optimization. This is something we might look into after the mid-end optimization work has landed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Fixes #4607
Add a peephole optimization for
icmp slt x, (iconst 0)
for 32 and 64 bit values, turning them into a right shift by 31 or 63 bits respectively. This optimization only applies to the case where the result of theicmp
is used directly, cases likebrz (icmp ..)
will be left alone.