Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multi-dim vectorization #4873

Merged
merged 69 commits into from
Sep 21, 2020
Merged

Support for multi-dim vectorization #4873

merged 69 commits into from
Sep 21, 2020

Conversation

vksnk
Copy link
Member

@vksnk vksnk commented Apr 24, 2020

This PR adds support for multi-dimensional vectorization as described in #4616.

There are two main pieces here:

  • nested ramps/broadcasts:
    • Ramp and Broadcast can be nested now, i.e. it is allowed for them to have a vector base, stride and value;
    • updates simplifier rules to account for that + extended fuzz_simplify to test it.
    • changes to Deinterleave, so it supports extract_lane and others on nested vector expressions;
    • smaller changes in some other place.
  • actual multi-dimensional vectorization:
    • this is the place where nested ramps/broadcasts can be introduced;
    • I have added a number of tests, which exercise most of the code branches inside of the VectorizeLoops to ensure it works correctly;
    • extra pass in the end of the lowering which flattens nested ramps/broadcasts, so existing codegens can work without changes

Notes/questions:

  • I updated every simplifier rule which involves ramp/broadcast, such that they are only applied when types of the arguments are matching. For some rules, which explicitly use lanes (for example, trying to prove ranges of the ramps), I've added an extra check if argument is scalar to make sure it's only applied in the case when lanes == type.lanes(). I think in order to make it general, we'd need to turn lanes into wildcard.
  • In some cases lanes argument is passed to the result of the rewriting rule, even though it's not present in the input. I've removed it and it works everywhere except Simplify_LT, I don't quite understand why, so would greatly appreciate any suggestions.
  • fuzz_simplify is failing for the type UInt(1) when vector width is more than 2 even before the changes, so I've disabled it for now. The error message is "Integer constant 2 will be implicitly coerced to type uint1, which changes its value to (uint1)0.. I am going to look into it next, but figured I ask in case this is known or anyone has any suggestions.
  • I am not sure how to trigger VectorizeLoops::scalarize(Expr), so put an assert if it's called from nested vectorization.

Otherwise, I think, it should be ready for review. Thanks!

vksnk added 26 commits March 31, 2020 15:25
…such that they are only applied to the Expr with same number of lanes.
* fuzz_simpify generates nested ramps/broadcasts
* changes is_same_lane_num to is_same_type
* some rules are disabled for now
* added is_scalar
* fixed ramp lane count
* enabled back all rules by guarding some which use 'lanes' with is_scalar
@steven-johnson
Copy link
Contributor

Where do we stand on this -- are we still pending other PRs?

@vksnk
Copy link
Member Author

vksnk commented Jul 8, 2020

I already merged in within-vector reductions PR, so it should be good as soon as tests failing on build bots are fixed. I was going to take a look at it today.

@steven-johnson
Copy link
Contributor

Cool -- no rush, just checking :-)

src/Deinterleave.cpp Outdated Show resolved Hide resolved
@@ -191,7 +191,7 @@ struct InterleavedRamp {
int lanes, repetitions;
};

bool is_interleaved_ramp(const Expr &e, const Scope<Expr> &scope, InterleavedRamp *result) {
bool is_interleaved_ramp(const Expr &e, const std::map<string, Expr> &widened_vars, InterleavedRamp *result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed from a Scope to a map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case when there are multiple vectorized loops, it's possible that the same let variable may be vectorized differently depending on the loop level it's used and the exact form may not be known when we first see it (for example, if it's used inside of the inner vectorized loop), so map is used to keep track of all possible variant which were used.

Copy link
Member

@abadams abadams Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused because a Scope is a map where you can have multiple values per key, and the most recently added value for the key hides the older ones. So a Scope should be strictly more general than a map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I guess I misunderstood the way it works; based on name and some usages I saw, I assumed that it's more like named stack and push/pop should go in last-in-first-out order, but in this case push and pop may happen in different functions (and maybe in different orders with other lets), so I thought it won't work correctly. However, if it's map of stacks instead, I guess this shouldn't be an issue. I can change map to scope and track widened names for Let/LetStmt separately, would this be preferable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to use a Scope instead is if the same name might show up multiple times in a way that a Scope would just handle correctly. Currently vectorization just happens on code with unique var names, but maybe it would be a useful to someone to be able to run it on stmts that don't obey that constraint. But if you're pushing/popping in non-stack order, then maybe that wouldn't work transparently. I need to figure out why you're pushing/popping these in non-stack order though to make sure I understand this code. I added a question about it elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I agree this is a good reason to use Scope! The reason I did it this way is because widened value of Let may change in the case when there is a vectorized loop inside of the Let body and this Let is used in there, so instead of mutating it in advance, it's mutated (sort of) lazily only when needed inside of the visit(const Variable *op) (see get_mutated_var_from_scope() function) to make sure that we've seen all of the vectorized loops. Maybe, as an alternative, it would be better to update all variables in scope every time we encounter a new vectorized loop, replace mutated variables in the scope, mutate loop body, and on return from the mutate(body), create widened variables if needed. That might be quite a bit of a change, but sounds like it's worth doing. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it to use Scope again using approach described above, and I think the code is better now (there are fewer changes for sure), thanks for the the suggestions!

src/VectorizeLoops.cpp Outdated Show resolved Hide resolved
src/VectorizeLoops.cpp Outdated Show resolved Hide resolved
@steven-johnson
Copy link
Contributor

Periodic ping to see where this PR stands

@abadams
Copy link
Member

abadams commented Jul 15, 2020

I believe it's stalled on me continuing my review. Sorry for the delay, I'm currently overloaded with things I have told people I will do.

@steven-johnson
Copy link
Contributor

No worries, just making sure there isn't stuff that needed something trivial to unblock :-)

@vksnk
Copy link
Member Author

vksnk commented Sep 17, 2020

All tests are green again.

src/Lower.cpp Outdated Show resolved Hide resolved
src/VectorizeLoops.cpp Outdated Show resolved Hide resolved
@abadams
Copy link
Member

abadams commented Sep 19, 2020

lgtm apart from my two minor comments.

@vksnk
Copy link
Member Author

vksnk commented Sep 21, 2020

Thank you for reviewing!

@vksnk vksnk merged commit f5a764f into master Sep 21, 2020
@vksnk vksnk deleted the vksnk/vector-ramp branch October 8, 2020 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants