[SVE] Support scalable vectors in LoopVectorizer #16782

ekalda · 2024-03-25T15:23:20Z

This patch add support for turning loops marked for vectorizing into scalable vectors if the extent of the loop is a vscale dependent expression in a correct form.

The testing for both scalable and fixed length vectors in test_tir_transform.py has been extended and most of the tests have been converted to TVMScript based testing against expected output.

ekalda · 2024-03-25T15:30:13Z

cc the usual suspects @lhutton1 @leandron @Anndrey24 @tqchen @Lunderberg

Lunderberg · 2024-03-26T14:12:02Z

The implementation looks reasonable, though I have one main question for it: What is the behavior of the updated pass for a target that doesn't support SVE? Prior SVE-commits enabled the functionality, but didn't produce SVE in any of the default lowering passes.

From this line, versions of LLVM before 11.0 do not support SVE, nor from my brief reading of the CUDA codegen here does CUDA.

Since VectorizeLoop occurs after the BindTarget pass, we can check the function attribute to know which target will be executing each function. I think we should have the loop vectorization apply only to fixed-extent loops by default, but enable the scalable vectorization for targets that support it.

lhutton1

Nice work, there are plenty of non-trivial changes here! Thanks for tidying up the tests as well :) The comments are largely just nitpicks 😅

src/tir/ir/expr.cc

src/tir/transforms/vectorize_loop.cc

tests/python/tir-transform/test_tir_transform_vectorize.py

src/tir/transforms/vectorize_loop.cc

ekalda · 2024-03-26T16:56:31Z

Thank you for your feedback @Lunderberg, much appreciated!

The implementation looks reasonable, though I have one main question for it: What is the behavior of the updated pass for a target that doesn't support SVE? Prior SVE-commits enabled the functionality, but didn't produce SVE in any of the default lowering passes.

From this line, versions of LLVM before 11.0 do not support SVE, nor from my brief reading of the CUDA codegen here does CUDA.

When it comes to targets that don't support SVE, I'd expect these targets to not trigger the creation of scalable vectors. In the current plan the creation of scalable vectors has to be intentional, i.e. it comes from splitting an axis by a vscale dependent expression in the (target dependent) schedules and vectorizing the resulting axis. If the LoopVectorizer is trying to create scalable vectors for target that doesn't support it, something has gone wrong and the compilation will fall over at some point:

If by some mistake a schedule that doesn't support VLA programming contains vscale, it will fall over latest in a target dependent codegen
If there is an attempt to vectorize loops with non-int extent that doesn't contain vscale, the "scalable ramp" creation will error since it expects the PrimExpr lanes in a form vscale * int. I realize though that this is a weird deviation from a current behaviour of

      if (!extent_as_int || extent_as_int->value < 1) {
        LOG(FATAL) << "Failed to vectorize loop with extent " << op->extent;
      }

so I'll modify the patch such that it checks for a target and fails as before if the extent is not an int.

Since VectorizeLoop occurs after the BindTarget pass, we can check the function attribute to know which target will be executing each function. I think we should have the loop vectorization apply only to fixed-extent loops by default, but enable the scalable vectorization for targets that support it.

In principle I'm not against making vectorizing for scalable vectors functionality more explicitly target specific, but it is not obvious to me what that would mean in terms of code? ICHECKs for the appropriate targets at the places where scalable vectors are created?

cbalint13 · 2024-03-29T22:28:42Z

Trying to reason loudly on this,

When it comes to targets that don't support SVE, I'd expect these targets to not trigger the creation of scalable vectors. In the current plan the creation of scalable vectors has to be intentional, i.e. it comes from splitting an axis by a vscale dependent expression in the (target dependent) schedules and vectorizing the resulting axis.

See, so the schedules are already target dependent but the underlaying execution of such schedule is not.
Well, in this case there would be no need for more additional extra checks I think, but correct me please if I got this wrong.

Lunderberg · 2024-03-31T14:37:22Z

That's a good point. This only applies for cases that would currently produce an error, where a ForKind::kVectorized is applied to a loop with dynamic extent.

I'm convinced, and agree that this change doesn't introduce any errors that would not already have been errors from an upstream pass. We may still want to have validation at this step, to avoid making errors be more complicated downstream, but that can be added later as needed.

This patch add support for turning loops marked for vectorizing into scalable vectors if the extent of the loop is a vscale dependent expression in a correct form. The testing for both scalable and fixed length vectors in test_tir_transform.py has been extended and most of the tests have been converted to TVMScript based testing against expected output. Co-authored-by: Luke Hutton <[email protected]> Co-authored-by: Neil Hickey <[email protected]>

ekalda · 2024-04-04T12:20:06Z

Thanks for all the reviews and discussion! The latest version now includes changes as a response to @lhutton1 review. I also looked into using the target info in the LoopVectorizer and discovered several issues, which I think will need fixes in separate patches:

Due to rather complex nesting of pass pipelines in the build process, VectorizeLoop gets called several times, first from Build: FoldConstant: FoldConstantExpr: ConstEvaluate: LowerTE: LowerTensorExpr and then from Build: FoldConstant: FoldConstantExpr: ConstEvaluate and then again in Build: GraphExecutorCodegen: LowerTE: LowerTensorExpr (this is all for TVMC -> graph executor codegen)
BindTarget is not run before VectorizeLoop in Build: FoldConstant: FoldConstantExpr: ConstEvaluate: LowerTE: LowerTensorExpr invocation, so the attr won't have target into in VectorizeLoop
For the cases when BindTarget has run before VectorizeLoop, it does not seem to be binding the target correctly (it's using the default "x86_64-pc-linux-gnu" instead of the user specified "aarch64-linux-gnu")
Another option is to fetch the "global" target through Target::Current(), but for a reason I've yet to track down it doesn't correctly resolve the target in all of the invocations of the VectorizeLoop and sometimes uses the default target again

So a bit of a can of worms there. I agree with @Lunderberg that in its current form the pass would not introduce errors that would have not been errors in other places of the stack, so I suggest we deal with the target detection issues separately and solidify VectorizeLoop with the target info at a later stage.

lhutton1

Thanks for the updates @ekalda, LGTM!

lhutton1 · 2024-04-09T13:38:35Z

Thanks @ekalda @Lunderberg @cbalint13! Let's investigate adding the target dependent error message in a follow-up

ekalda mentioned this pull request Mar 25, 2024

[Tracking Issue] Scalable Vector Extension (SVE) upstreaming #16455

Open

12 tasks

lhutton1 reviewed Mar 26, 2024

View reviewed changes

ekalda and others added 3 commits April 4, 2024 11:35

Linting...

bd59c5d

Respond to review and add one more tests

477b89a

ekalda force-pushed the p5-sve-loopvectorizer branch from e2fd300 to 477b89a Compare April 4, 2024 12:13

lhutton1 approved these changes Apr 8, 2024

View reviewed changes

lhutton1 merged commit 4d4f050 into apache:main Apr 9, 2024
28 checks passed

ekalda mentioned this pull request Apr 16, 2024

[SVE] Check for SVE target in VectorizeLoop #16893

Merged

ekalda deleted the p5-sve-loopvectorizer branch April 16, 2024 15:56

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SVE] Support scalable vectors in LoopVectorizer #16782

[SVE] Support scalable vectors in LoopVectorizer #16782

ekalda commented Mar 25, 2024

ekalda commented Mar 25, 2024

Lunderberg commented Mar 26, 2024

lhutton1 left a comment

ekalda commented Mar 26, 2024

cbalint13 commented Mar 29, 2024

Lunderberg commented Mar 31, 2024

ekalda commented Apr 4, 2024

lhutton1 left a comment

lhutton1 commented Apr 9, 2024

[SVE] Support scalable vectors in LoopVectorizer #16782

[SVE] Support scalable vectors in LoopVectorizer #16782

Conversation

ekalda commented Mar 25, 2024

ekalda commented Mar 25, 2024

Lunderberg commented Mar 26, 2024

lhutton1 left a comment

Choose a reason for hiding this comment

ekalda commented Mar 26, 2024

cbalint13 commented Mar 29, 2024

Lunderberg commented Mar 31, 2024

ekalda commented Apr 4, 2024

lhutton1 left a comment

Choose a reason for hiding this comment

lhutton1 commented Apr 9, 2024