-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legalize b{and,or,xor}_not
into component instructions
#5709
Legalize b{and,or,xor}_not
into component instructions
#5709
Conversation
Alternatively instead of modifying clif one possibility would be to split apart band_not during either legalization or during egraph optimizations and then have the backend pattern match both band_not and (band (bnot ..)) |
The general tack we've taken is to decompose ops, so yes, removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether we'll merge this, given that the question of whether these instructions should exist at all is an excellent one. But in case we do, there are a couple of things we should fix first.
This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction.
Same as prior commit, but for the `bor_not` instruction.
Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes.
Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented.
These shouldn't ever reach egraphs now that they're legalized away.
This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`.
Test that end-to-end various optimizations are being applied for input wasm modules.
1e7ac10
to
a49de07
Compare
bnot
and band_not
b{and,or,xor}_not
into component instructions
I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is.
Ok I've pushed up an alternate strategy of reaching my goal:
A single egraph-rule is retained for One shortcoming remaining, however, is that the backends currently only pattern match |
Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules.
This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes.
No need to have the commutative version since constants are already shuffled right for egraphs
That seems like a reasonable thing to try, yup! |
Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input.
Ok I've pushed up a commit which does that and it seems to work well for at least the small unit tests. |
I wouldn't expect a "move I think the backends need to match both patterns instead. Which is annoying, since the patterns overlap; we have to give them different priorities. I also think we need to work on helping ISLE understand commutative operators. It could generate every unique version of a rule with operands swapped, given a |
Ok! I've removed the egraph bits for that and added more lowering rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just one small whitespace fix.
Co-authored-by: Jamey Sharp <[email protected]>
Isn't this the same, given that |
I originally thought that s390x's original lowering in bytecodealliance#5709, but as was rightfully pointed out `(bnot (bxor x y))` is equivalent to `(bxor x (bnot y))` so the special lowering for one should apply as a special lowering for the other. For the s390x and aarch64 backend that have already have a fused lowering of the bxor/bnot add a lowering additionally for the bnot/bxor combination.
Ah yes indeed! |
* Add (bnot (bxor x y)) lowerings for s390x/aarch64 I originally thought that s390x's original lowering in #5709, but as was rightfully pointed out `(bnot (bxor x y))` is equivalent to `(bxor x (bnot y))` so the special lowering for one should apply as a special lowering for the other. For the s390x and aarch64 backend that have already have a fused lowering of the bxor/bnot add a lowering additionally for the bnot/bxor combination. * Add bnot(bxor(..)) tests for s390x 128-bit sizes
I was curious to see the effect of #5701 on the compilation of spidermonkey.wasm and much to my dismay I saw that the
andn
instruction was never generated. It turns out, however, that WebAssembly does not have a "bnot" equivalent instruction fori32
ori64
, nor a "band_not" instruction. To trigger the optimization added there I added a few egraph simplification rules to pattern match xor against -1 tobnot
.I then additionally noticed that the lowering added in #5701 was for the clif
band_not
instruction, but this was never actually used by wasm. Additionally egraphs didn't fuseband
-of-bnot
into one instruction, so the lowering still never fired. To address this issue I've added legalizations now fromband_not
intoband
-of-bnot
in addition tobor_not
andbxor_not
. This involved updating all backends with fused lowerings to do more pattern-matching on the top-level operation instead of matching onb*_not
since those now never reach the backends.After all this I've added a
*.wat
test to assert thatbnot
shows up and at least for x86_64 the specialized lowering ofband_not
is triggered.Along the way I noticed a few things:
band_not
unconditionally usedgen_andn
but internallygen_andn
tested for codegen features. Instead the lowering of(band ... (bnot ..))
now explicitly tests for codegen features and falls back to normal rules if the feature forandn
isn't available.bxor_not
instruction was buggy on s390x int that it was calculating(bnot (bxor x y))
instead of(bxor x (bnot y))
. The tests got updated as a result. I'm not so certain that the remaining lowerings are correct since the instructions are calledNotXor32
, for example, but I haven't tested.