Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legalize b{and,or,xor}_not into component instructions #5709

Merged
merged 15 commits into from
Feb 6, 2023

Conversation

alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Feb 4, 2023

I was curious to see the effect of #5701 on the compilation of spidermonkey.wasm and much to my dismay I saw that the andn instruction was never generated. It turns out, however, that WebAssembly does not have a "bnot" equivalent instruction for i32 or i64, nor a "band_not" instruction. To trigger the optimization added there I added a few egraph simplification rules to pattern match xor against -1 to bnot.

I then additionally noticed that the lowering added in #5701 was for the clif band_not instruction, but this was never actually used by wasm. Additionally egraphs didn't fuse band-of-bnot into one instruction, so the lowering still never fired. To address this issue I've added legalizations now from band_not into band-of-bnot in addition to bor_not and bxor_not. This involved updating all backends with fused lowerings to do more pattern-matching on the top-level operation instead of matching on b*_not since those now never reach the backends.

After all this I've added a *.wat test to assert that bnot shows up and at least for x86_64 the specialized lowering of band_not is triggered.

Along the way I noticed a few things:

  • I cleaned up some riscv64 bits where band_not unconditionally used gen_andn but internally gen_andn tested for codegen features. Instead the lowering of (band ... (bnot ..)) now explicitly tests for codegen features and falls back to normal rules if the feature for andn isn't available.
  • I think the bxor_not instruction was buggy on s390x int that it was calculating (bnot (bxor x y)) instead of (bxor x (bnot y)). The tests got updated as a result. I'm not so certain that the remaining lowerings are correct since the instructions are called NotXor32, for example, but I haven't tested.

@alexcrichton
Copy link
Member Author

Alternatively instead of modifying clif one possibility would be to split apart band_not during either legalization or during egraph optimizations and then have the backend pattern match both band_not and (band (bnot ..))

@cfallin
Copy link
Member

cfallin commented Feb 4, 2023

The general tack we've taken is to decompose ops, so yes, removing the _not variants and making backends match when able to use andn instructions seems like the better approach here. In general the more "normalized" our IR is, the easier it'll be to write rules that apply appropriately. cc @elliottt -- something we can think about next week...

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Feb 4, 2023
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether we'll merge this, given that the question of whether these instructions should exist at all is an excellent one. But in case we do, there are a couple of things we should fix first.

cranelift/codegen/src/opts/algebraic.isle Outdated Show resolved Hide resolved
cranelift/codegen/src/opts/algebraic.isle Outdated Show resolved Hide resolved
This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.

Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.
Same as prior commit, but for the `bor_not` instruction.
Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.
Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.
These shouldn't ever reach egraphs now that they're legalized away.
This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.
Test that end-to-end various optimizations are being applied for input
wasm modules.
@alexcrichton alexcrichton changed the title Add egraph rules to generate bnot and band_not Legalize b{and,or,xor}_not into component instructions Feb 4, 2023
I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.
@alexcrichton
Copy link
Member Author

Ok I've pushed up an alternate strategy of reaching my goal:

  • Legalization has been added to "break apart" the band_not and other related instructions
  • All backend lowerings for band_not and related instructions are moved up to pattern-matching

A single egraph-rule is retained for x^-1 being translated to ~x, which is enough to trigger wasm modules to use andn on x86_64, for example.

One shortcoming remaining, however, is that the backends currently only pattern match (band x (bnot y)), not (band (bnot y) x), for example. This means that if the order of the operands in the *.wat test I added are swapped then the andn instruction isn't generated. Is this something that should be fixed with an egraph rule? I thought I remembered reading there's a "move constants to the right" rule so should there similarly be a "move bnot to the right" rule? Or should the backends all have duplicate patterns for each order?

@github-actions github-actions bot added cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Feb 4, 2023
Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.
This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.
No need to have the commutative version since constants are already
shuffled right for egraphs
@cfallin
Copy link
Member

cfallin commented Feb 4, 2023

One shortcoming remaining, however, is that the backends currently only pattern match (band x (bnot y)), not (band (bnot y) x), for example. This means that if the order of the operands in the *.wat test I added are swapped then the andn instruction isn't generated. Is this something that should be fixed with an egraph rule? I thought I remembered reading there's a "move constants to the right" rule so should there similarly be a "move bnot to the right" rule?

That seems like a reasonable thing to try, yup!

Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.
@alexcrichton
Copy link
Member Author

Ok I've pushed up a commit which does that and it seems to work well for at least the small unit tests.

@jameysharp
Copy link
Contributor

I wouldn't expect a "move bnot to the right" egraph rule to work reliably for backend patterns. It correctly declares that they're equivalent expressions, but then during elaboration we pick one expression as our final choice to hand to the backend, and there's no guarantee we'll pick the one with bnot on the right, since the two alternatives have equal cost. I assume it's working now because elaboration happens to be picking the last-added option in the e-class in case of ties, or something like that.

I think the backends need to match both patterns instead. Which is annoying, since the patterns overlap; we have to give them different priorities.

I also think we need to work on helping ISLE understand commutative operators. It could generate every unique version of a rule with operands swapped, given a commutative flag on terms or something. We'd have to put some thought into how that interacts with overlap checking though.

@alexcrichton
Copy link
Member Author

Ok! I've removed the egraph bits for that and added more lowering rules

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just one small whitespace fix.

cranelift/codegen/src/isa/x64/lower.isle Outdated Show resolved Hide resolved
@alexcrichton alexcrichton merged commit de0e0be into bytecodealliance:main Feb 6, 2023
@alexcrichton alexcrichton deleted the wasm-bandnot branch February 6, 2023 19:53
@uweigand
Copy link
Member

  • I think the bxor_not instruction was buggy on s390x int that it was calculating (bnot (bxor x y)) instead of (bxor x (bnot y)).

Isn't this the same, given that bnot is XOR with -1, and XOR is associative?

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Feb 10, 2023
I originally thought that s390x's original lowering in bytecodealliance#5709, but as was
rightfully pointed out `(bnot (bxor x y))` is equivalent to
`(bxor x (bnot y))` so the special lowering for one should apply as a
special lowering for the other. For the s390x and aarch64 backend that
have already have a fused lowering of the bxor/bnot add a lowering
additionally for the bnot/bxor combination.
@alexcrichton
Copy link
Member Author

Ah yes indeed!

alexcrichton added a commit that referenced this pull request Feb 13, 2023
* Add (bnot (bxor x y)) lowerings for s390x/aarch64

I originally thought that s390x's original lowering in #5709, but as was
rightfully pointed out `(bnot (bxor x y))` is equivalent to
`(bxor x (bnot y))` so the special lowering for one should apply as a
special lowering for the other. For the s390x and aarch64 backend that
have already have a fused lowering of the bxor/bnot add a lowering
additionally for the bnot/bxor combination.

* Add bnot(bxor(..)) tests for s390x 128-bit sizes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants