-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microNPU][1] Add affine analysis structures for the cascader #9458
Conversation
also cc @csullivan |
minor note: not needed for now, but it may be helpful to take a look at https://github.com/apache/tvm/blob/main/include/tvm/arith/iter_affine_map.h to see if there is any utils that can be reused |
I did spend a little bit of time looking at this after @junrushao1994 made me aware of it. It looks like it'd make for a good integration point for a 'v2' - especially once we upgrade to TensorIR. However, I think there are a few representational issues that would make it hard to directly leverage in the current design. |
b878cb5
to
c71c421
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff @mbaret, this is only a partial review but I had a couple questions and comments. I'll hopefully be able to wrap the review up on Monday. Apologies for the slow turn around, but thanks for the great contribution!
* error will compound when we need to multiply the strides by the number of | ||
* stripes along a given axis. | ||
*/ | ||
inline std::vector<float> GetStrides() const { return strides_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example that we can write or link to in this doc string to illustrate the error accumulation that occurs from using ceildiv style rounding. I didn't quite follow the current description. By fractional striding are you trying to describe the case when the stripe shape dims are not divisors of the input shape dims?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best example of this is an upscale operation. If we have a 2x2 upscale and choose to stripe the output in 3x3 stripes, the input stripes will get a fractional stride of 3/2. We can't just ceil round this, otherwise as we increase the striding it'll get further and further from the truth. I'll add this to the docs.
Side note: I have considered re-expressing this as a rational number rather than a float, but I think that can be a future improvement for now.
* | ||
* The size of that stripe in each axis is the 'shape'. The strides is how far | ||
* you should move between stripes, so also (4, 4) for a simple non-overlappping | ||
* tiling. However, we explore some overlapping scheduling options so shape != strides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping on earlier comment ⬆️ , an example like one of these when stride value is non-integral would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an example based on 2x2 upscale.
* | ||
* Finally, the 'offset' tells us where to start the first stripe. In this simple | ||
* case the offset is just (0, 0), but in something like a padding operation we | ||
* may want to start from a negative index, which is captured by the offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note about how negative indexing is handled for an operation would be helpful. For example, the stripe config between two operations Op1 and Op2, will depend on the padding needed by Op2, but will influence where Op1 writes its memory. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the doc to use slice as the example here because I think that's easier to follow. Regarding the padding case, it's a bit of a challenge to explain without a diagram but I'll give it a go here.
Let's say we have an op A that represents a symmetric pad by 1 and two tensors T_in and T_out such that T_in -> A -> T_out. If T_in has shape (4, 4) then T_out will have shape (6, 6) after the padding. Now, we choose a StripeConfig for T_out which is equivalent to (2, 2) tiling:
StripeConfig T_out =
{
shape=[2, 2],
extent=[6, 6],
strides=[2, 2],
order=[1, 2],
stripes=[3, 3],
offset=[0, 0],
}
The question then is, what StripeConfig will need to be produced when we propagate to T_in? Well, if we state that any part of a stripe that lies outside of the tensor bounds can be ignored, then the input StripeConfig should be this:
StripeConfig T_in =
{
shape=[2, 2],
extent=[6, 6],
strides=[2, 2],
order=[1, 2],
stripes=[3, 3],
offset=[-1, -1],
}
This will effectively 'overlay' the output StripeConfig on the (4, 4) input tensor T_in, but such that outer 1-wide padding margin is always out-of-bounds (either <0 or >=4 in either axis). This way, reads will never be generated for the padding but they will for the 'interior'.
If this is still unclear - which I appreciate it might be - I can potentially produce a quick diagram.
c71c421
to
ce0d276
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly data structure introductions at this point! LGTM.
The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b
Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8
Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824
Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3
ce0d276
to
b6dd634
Compare
ping @csullivan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @mbaret.
This is merged now, thanks @csullivan @mbaret @NicolaLancellotti @manupa-arm and @tqchen. |
…#9458) * [ETHOSU][1] Add affine analysis structures for the cascader The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b * Add test guards Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8 * Address review comments Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824 * Improve docs Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3
…#9458) * [ETHOSU][1] Add affine analysis structures for the cascader The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b * Add test guards Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8 * Address review comments Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824 * Improve docs Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3
…#9458) * [ETHOSU][1] Add affine analysis structures for the cascader The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b * Add test guards Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8 * Address review comments Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824 * Improve docs Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3
…#9458) * [ETHOSU][1] Add affine analysis structures for the cascader The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b * Add test guards Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8 * Address review comments Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824 * Improve docs Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3
…#9458) * [ETHOSU][1] Add affine analysis structures for the cascader The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade. To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig. By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance). Change-Id: If7176fea961c631be4a6c195303da536030d957b * Add test guards Change-Id: I1d7633e20daab33642fa5c4a12e474a4def4d8b8 * Address review comments Change-Id: Iff5f1effa08e0628de91f5577487d0cecebec824 * Improve docs Change-Id: I508809d8c1a08d231e3a9b0fd9b3f2639cc2f0e3
RFC: apache/tvm-rfcs#37
Issue: #9429
The cascader relies heavily on being able to determine data dependencies between operators. This is so that it can calculate how stripes should be propagated through a cascade.
To do this, two data structures are defined: StripeConfig and Propagator. StripeConfig stores information for how a tensor should be broken up into stripes and executed. Propagator transforms a StripeConfig using an affine transform matrix, allowing an input StripeConfig for an operator to be determined by 'propagating' the output StripeConfig.
By chaining together Propagators, we can analyse how data dependencies vary throughout a cascade and therefore calculate the memory requirements (and approximate the performance).