Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVE] Support scalable vectors in LoopVectorizer #16782

Merged
merged 3 commits into from
Apr 9, 2024

Conversation

ekalda
Copy link
Contributor

@ekalda ekalda commented Mar 25, 2024

This patch add support for turning loops marked for vectorizing into scalable vectors if the extent of the loop is a vscale dependent expression in a correct form.

The testing for both scalable and fixed length vectors in test_tir_transform.py has been extended and most of the tests have been converted to TVMScript based testing against expected output.

@ekalda
Copy link
Contributor Author

ekalda commented Mar 25, 2024

cc the usual suspects @lhutton1 @leandron @Anndrey24 @tqchen @Lunderberg

@Lunderberg
Copy link
Contributor

The implementation looks reasonable, though I have one main question for it: What is the behavior of the updated pass for a target that doesn't support SVE? Prior SVE-commits enabled the functionality, but didn't produce SVE in any of the default lowering passes.

From this line, versions of LLVM before 11.0 do not support SVE, nor from my brief reading of the CUDA codegen here does CUDA.

Since VectorizeLoop occurs after the BindTarget pass, we can check the function attribute to know which target will be executing each function. I think we should have the loop vectorization apply only to fixed-extent loops by default, but enable the scalable vectorization for targets that support it.

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, there are plenty of non-trivial changes here! Thanks for tidying up the tests as well :) The comments are largely just nitpicks 😅

src/tir/ir/expr.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Show resolved Hide resolved
tests/python/tir-transform/test_tir_transform_vectorize.py Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
src/tir/transforms/vectorize_loop.cc Outdated Show resolved Hide resolved
@ekalda
Copy link
Contributor Author

ekalda commented Mar 26, 2024

Thank you for your feedback @Lunderberg, much appreciated!

The implementation looks reasonable, though I have one main question for it: What is the behavior of the updated pass for a target that doesn't support SVE? Prior SVE-commits enabled the functionality, but didn't produce SVE in any of the default lowering passes.

From this line, versions of LLVM before 11.0 do not support SVE, nor from my brief reading of the CUDA codegen here does CUDA.

When it comes to targets that don't support SVE, I'd expect these targets to not trigger the creation of scalable vectors. In the current plan the creation of scalable vectors has to be intentional, i.e. it comes from splitting an axis by a vscale dependent expression in the (target dependent) schedules and vectorizing the resulting axis. If the LoopVectorizer is trying to create scalable vectors for target that doesn't support it, something has gone wrong and the compilation will fall over at some point:

  • If by some mistake a schedule that doesn't support VLA programming contains vscale, it will fall over latest in a target dependent codegen
  • If there is an attempt to vectorize loops with non-int extent that doesn't contain vscale, the "scalable ramp" creation will error since it expects the PrimExpr lanes in a form vscale * int. I realize though that this is a weird deviation from a current behaviour of
      if (!extent_as_int || extent_as_int->value < 1) {
        LOG(FATAL) << "Failed to vectorize loop with extent " << op->extent;
      }

so I'll modify the patch such that it checks for a target and fails as before if the extent is not an int.

Since VectorizeLoop occurs after the BindTarget pass, we can check the function attribute to know which target will be executing each function. I think we should have the loop vectorization apply only to fixed-extent loops by default, but enable the scalable vectorization for targets that support it.

In principle I'm not against making vectorizing for scalable vectors functionality more explicitly target specific, but it is not obvious to me what that would mean in terms of code? ICHECKs for the appropriate targets at the places where scalable vectors are created?

@cbalint13
Copy link
Contributor

Trying to reason loudly on this,

When it comes to targets that don't support SVE, I'd expect these targets to not trigger the creation of scalable vectors. In the current plan the creation of scalable vectors has to be intentional, i.e. it comes from splitting an axis by a vscale dependent expression in the (target dependent) schedules and vectorizing the resulting axis.

See, so the schedules are already target dependent but the underlaying execution of such schedule is not.
Well, in this case there would be no need for more additional extra checks I think, but correct me please if I got this wrong.

@Lunderberg
Copy link
Contributor

That's a good point. This only applies for cases that would currently produce an error, where a ForKind::kVectorized is applied to a loop with dynamic extent.

I'm convinced, and agree that this change doesn't introduce any errors that would not already have been errors from an upstream pass. We may still want to have validation at this step, to avoid making errors be more complicated downstream, but that can be added later as needed.

ekalda and others added 3 commits April 4, 2024 11:35
This patch add support for turning loops marked for vectorizing into
scalable vectors if the extent of the loop is a vscale dependent
expression in a correct form.

The testing for both scalable and fixed length vectors in
test_tir_transform.py has been extended and most of the tests
have been converted to TVMScript based testing against expected
output.

Co-authored-by: Luke Hutton <[email protected]>
Co-authored-by: Neil Hickey <[email protected]>
@ekalda ekalda force-pushed the p5-sve-loopvectorizer branch from e2fd300 to 477b89a Compare April 4, 2024 12:13
@ekalda
Copy link
Contributor Author

ekalda commented Apr 4, 2024

Thanks for all the reviews and discussion! The latest version now includes changes as a response to @lhutton1 review. I also looked into using the target info in the LoopVectorizer and discovered several issues, which I think will need fixes in separate patches:

  • Due to rather complex nesting of pass pipelines in the build process, VectorizeLoop gets called several times, first from Build: FoldConstant: FoldConstantExpr: ConstEvaluate: LowerTE: LowerTensorExpr and then from Build: FoldConstant: FoldConstantExpr: ConstEvaluate and then again in Build: GraphExecutorCodegen: LowerTE: LowerTensorExpr (this is all for TVMC -> graph executor codegen)
  • BindTarget is not run before VectorizeLoop in Build: FoldConstant: FoldConstantExpr: ConstEvaluate: LowerTE: LowerTensorExpr invocation, so the attr won't have target into in VectorizeLoop
  • For the cases when BindTarget has run before VectorizeLoop, it does not seem to be binding the target correctly (it's using the default "x86_64-pc-linux-gnu" instead of the user specified "aarch64-linux-gnu")
  • Another option is to fetch the "global" target through Target::Current(), but for a reason I've yet to track down it doesn't correctly resolve the target in all of the invocations of the VectorizeLoop and sometimes uses the default target again

So a bit of a can of worms there. I agree with @Lunderberg that in its current form the pass would not introduce errors that would have not been errors in other places of the stack, so I suggest we deal with the target detection issues separately and solidify VectorizeLoop with the target info at a later stage.

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @ekalda, LGTM!

@lhutton1 lhutton1 merged commit 4d4f050 into apache:main Apr 9, 2024
28 checks passed
@lhutton1
Copy link
Contributor

lhutton1 commented Apr 9, 2024

Thanks @ekalda @Lunderberg @cbalint13! Let's investigate adding the target dependent error message in a follow-up

@ekalda ekalda deleted the p5-sve-loopvectorizer branch April 16, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants