[microNPU][1] Add affine analysis structures for the cascader #9458

mbaret · 2021-11-05T11:00:12Z

The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade.

To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig.

By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance).

mbaret · 2021-11-05T11:01:15Z

cc @mbs-octoml @junrushao1994 @jacobbohlin @manupa-arm @NicolaLancellotti

src/contrib/ethosu/cascader/stripe_config.h

src/contrib/ethosu/cascader/propagator.cc

mbaret · 2021-11-09T14:59:14Z

also cc @csullivan

tqchen · 2021-11-09T15:02:25Z

minor note: not needed for now, but it may be helpful to take a look at https://github.com/apache/tvm/blob/main/include/tvm/arith/iter_affine_map.h to see if there is any utils that can be reused

mbaret · 2021-11-09T15:12:34Z

minor note: not needed for now, but it may be helpful to take a look at https://github.com/apache/tvm/blob/main/include/tvm/arith/iter_affine_map.h to see if there is any utils that can be reused

I did spend a little bit of time looking at this after @junrushao1994 made me aware of it. It looks like it'd make for a good integration point for a 'v2' - especially once we upgrade to TensorIR. However, I think there are a few representational issues that would make it hard to directly leverage in the current design.

csullivan

Great stuff @mbaret, this is only a partial review but I had a couple questions and comments. I'll hopefully be able to wrap the review up on Monday. Apologies for the slow turn around, but thanks for the great contribution!

csullivan · 2021-11-20T00:14:04Z

src/contrib/ethosu/cascader/stripe_config.h

+   * error will compound when we need to multiply the strides by the number of
+   * stripes along a given axis.
+   */
+  inline std::vector<float> GetStrides() const { return strides_; }


Is there an example that we can write or link to in this doc string to illustrate the error accumulation that occurs from using ceildiv style rounding. I didn't quite follow the current description. By fractional striding are you trying to describe the case when the stripe shape dims are not divisors of the input shape dims?

The best example of this is an upscale operation. If we have a 2x2 upscale and choose to stripe the output in 3x3 stripes, the input stripes will get a fractional stride of 3/2. We can't just ceil round this, otherwise as we increase the striding it'll get further and further from the truth. I'll add this to the docs.

Side note: I have considered re-expressing this as a rational number rather than a float, but I think that can be a future improvement for now.

csullivan · 2021-11-20T00:24:24Z

src/contrib/ethosu/cascader/stripe_config.h

+ *
+ * The size of that stripe in each axis is the 'shape'. The strides is how far
+ * you should move between stripes, so also (4, 4) for a simple non-overlappping
+ * tiling. However, we explore some overlapping scheduling options so shape != strides


Ping on earlier comment ⬆️ , an example like one of these when stride value is non-integral would be nice.

I've added an example based on 2x2 upscale.

csullivan · 2021-11-20T00:32:45Z

src/contrib/ethosu/cascader/stripe_config.h

+ *
+ * Finally, the 'offset' tells us where to start the first stripe. In this simple
+ * case the offset is just (0, 0), but in something like a padding operation we
+ * may want to start from a negative index, which is captured by the offset.


A note about how negative indexing is handled for an operation would be helpful. For example, the stripe config between two operations Op1 and Op2, will depend on the padding needed by Op2, but will influence where Op1 writes its memory. Is that correct?

I've updated the doc to use slice as the example here because I think that's easier to follow. Regarding the padding case, it's a bit of a challenge to explain without a diagram but I'll give it a go here.

Let's say we have an op A that represents a symmetric pad by 1 and two tensors T_in and T_out such that T_in -> A -> T_out. If T_in has shape (4, 4) then T_out will have shape (6, 6) after the padding. Now, we choose a StripeConfig for T_out which is equivalent to (2, 2) tiling:

StripeConfig T_out = { shape=[2, 2], extent=[6, 6], strides=[2, 2], order=[1, 2], stripes=[3, 3], offset=[0, 0], }

The question then is, what StripeConfig will need to be produced when we propagate to T_in? Well, if we state that any part of a stripe that lies outside of the tensor bounds can be ignored, then the input StripeConfig should be this:

StripeConfig T_in = { shape=[2, 2], extent=[6, 6], strides=[2, 2], order=[1, 2], stripes=[3, 3], offset=[-1, -1], }

This will effectively 'overlay' the output StripeConfig on the (4, 4) input tensor T_in, but such that outer 1-wide padding margin is always out-of-bounds (either <0 or >=4 in either axis). This way, reads will never be generated for the padding but they will for the 'interior'.

If this is still unclear - which I appreciate it might be - I can potentially produce a quick diagram.

manupak

Mostly data structure introductions at this point! LGTM.

The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b

Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8

Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824

Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3

mbaret · 2021-11-30T17:12:31Z

ping @csullivan

NicolaLancellotti

LGTM

csullivan

LGTM, thanks @mbaret.

leandron · 2021-12-01T13:01:28Z

This is merged now, thanks @csullivan @mbaret @NicolaLancellotti @manupa-arm and @tqchen.

…#9458) * [ETHOSU][1] Add affine analysis structures for the cascader The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b * Add test guards Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8 * Address review comments Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824 * Improve docs Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3

mbaret requested review from areusch, comaniac, icemelon, jroesch, junrushao, merrymercy, tqchen, yzhliu and zhiics as code owners November 5, 2021 11:00

mbaret mentioned this pull request Nov 5, 2021

[Tracking Issue] Arm(R) Ethos(TM)-U Cascading Scheduler #9429

Closed

12 tasks

NicolaLancellotti reviewed Nov 5, 2021

View reviewed changes

src/contrib/ethosu/cascader/stripe_config.h Outdated Show resolved Hide resolved

src/contrib/ethosu/cascader/propagator.cc Outdated Show resolved Hide resolved

mbaret mentioned this pull request Nov 8, 2021

[microNPU][2a] Add CascaderGraph for cascading analysis #9469

Merged

mbaret changed the title ~~[ETHOSU][1] Add affine analysis structures for the cascader~~ [microNPU][1] Add affine analysis structures for the cascader Nov 8, 2021

mbaret force-pushed the ethosu-cascader-1 branch 2 times, most recently from b878cb5 to c71c421 Compare November 18, 2021 13:40

csullivan reviewed Nov 20, 2021

View reviewed changes

mbaret force-pushed the ethosu-cascader-1 branch from c71c421 to ce0d276 Compare November 25, 2021 11:38

manupak approved these changes Nov 30, 2021

View reviewed changes

mbaret added 4 commits November 30, 2021 17:11

Add test guards

6c93b3a

Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8

Address review comments

9121141

Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824

Improve docs

b6dd634

Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3

mbaret force-pushed the ethosu-cascader-1 branch from ce0d276 to b6dd634 Compare November 30, 2021 17:12

NicolaLancellotti approved these changes Nov 30, 2021

View reviewed changes

csullivan approved these changes Nov 30, 2021

View reviewed changes

leandron merged commit 2275359 into apache:main Dec 1, 2021

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[microNPU][1] Add affine analysis structures for the cascader #9458

[microNPU][1] Add affine analysis structures for the cascader #9458

mbaret commented Nov 5, 2021

mbaret commented Nov 5, 2021

mbaret commented Nov 9, 2021

tqchen commented Nov 9, 2021

mbaret commented Nov 9, 2021

csullivan left a comment

csullivan Nov 20, 2021

mbaret Nov 25, 2021

csullivan Nov 20, 2021

mbaret Nov 25, 2021

csullivan Nov 20, 2021

mbaret Nov 25, 2021

manupak left a comment

mbaret commented Nov 30, 2021

NicolaLancellotti left a comment

csullivan left a comment

leandron commented Dec 1, 2021

[microNPU][1] Add affine analysis structures for the cascader #9458

[microNPU][1] Add affine analysis structures for the cascader #9458

Conversation

mbaret commented Nov 5, 2021

mbaret commented Nov 5, 2021

mbaret commented Nov 9, 2021

tqchen commented Nov 9, 2021

mbaret commented Nov 9, 2021

csullivan left a comment

Choose a reason for hiding this comment

csullivan Nov 20, 2021

Choose a reason for hiding this comment

mbaret Nov 25, 2021

Choose a reason for hiding this comment

csullivan Nov 20, 2021

Choose a reason for hiding this comment

mbaret Nov 25, 2021

Choose a reason for hiding this comment

csullivan Nov 20, 2021

Choose a reason for hiding this comment

mbaret Nov 25, 2021

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

mbaret commented Nov 30, 2021

NicolaLancellotti left a comment

Choose a reason for hiding this comment

csullivan left a comment

Choose a reason for hiding this comment

leandron commented Dec 1, 2021