-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U #13645
[microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U #13645
Conversation
2a4b651
to
26d7a78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexey-yazev for the PR! Looks all good in terms of functionality, my only concern is that this PR adds more than 800 FVP based codegen tests that are notoriously slow and would probably make upstream CI take significantly longer. The RELUs are all quite simple activation functions, so I don't think we get much benefit from testing them with each setting of shape, padding, dilation etc of e.g. conv2d or depthwise2d. I think it would be better to have separate codegen tests for the activation functions that operate on preceding ops with fixed attributes.
def relu_n1_to_1(x): | ||
""" | ||
The specific pattern will be replaced into RELU_N1_TO_1 by tflite. | ||
""" | ||
return tf.math.maximum(-1.0, tf.math.minimum(x, 1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, what Relay does this get lowered to? I always thought all the different kinds of RELUs were lowered to the same Relay clip with just different min and max and then this clip was pattern matched as a part of all the different op pattern matchers and added to the primitive op's clip_min
and clip_max
, but as we don't pattern match for standalone clip and this test here passes, I suppose it gets lowered to something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All relu operations are converted to clip operations in tflite frontend. It seems these tests are unnecessary perhaps it's worth adding a test for maximum operation when there are different scales and it's uploaded to the npu not by one operation as in other cases, but by two.
Tests are extended with cases with activations relu6 and relu_n1_to_1. Does not fuse min and max operations with requantize if there are different scales as it is not supported on NPU.
Remain relu_n1_to_1 for testing case when activation cannot be fused with the previous operation
007e2cb
to
f692260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay on this, I took another look and I'm still quite confused about what is going on, so maybe you can elaborate more in the commit body? Two things:
(1) We should still check whether TFLite's RELU6 is correctly executed on NPU, some basic conv2d + RELU6 tests should do
(2) I'm confused about the min/max changes - is it an unrelated bugfix or necessary for RELU_N1_TO_1 implementation?
def minimum_clip_requantize_pattern() -> tvm.relay.dataflow_pattern.DFPattern: | ||
""" | ||
This function creates the pattern for minimum with fused RELU activation. | ||
""" | ||
pattern = is_op("minimum")(wildcard(), wildcard()) | ||
pattern = is_op("clip")(pattern) | ||
pattern = is_op("qnn.requantize")( | ||
pattern, is_constant(), is_constant(), is_constant(), is_constant() | ||
) | ||
return pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation behind having two patterns instead of one with optional requantize if we use same params extractor (MinParams
) with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two patterns to support two cases:
a) when we can offload minimum + clip + qnn.requantize to NPU with one operation ethosu_binary_elementwise if there are same scales
b) when we can offload minimum + clip + qnn.requantize to NPU with two operations ethosu_binary_elementwise + ethosu_identity if there are different scales
Since there is a hardware limitation, we cannot perform min or max operation fused with requantize (please look at NPU_SET_OFM_SCALE https://developer.arm.com/documentation/102420/0200/Programmers-model/Command-stream/cmd1-commands-) when we have different scales.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see how that works now :) I think so far the approach has been to match all the variations of the operator and then output the right combination of Ethos-U ops, e.g. Resize2d
. If there is a reason it can't be done that way, it would be good to document it there since it is not immediately obvious from the code why we need two patterns.
I think it also needs some legalization tests to check that:
(1) min/max with matching scales -> ethosu_binary_elementwise
(2) min/max with different scales -> ethosu_binary_elementwise
+ ethosu_identity
in my opinion it will be enough to add separate tests for RELU6 and RELU_N1_TO_1 since RELU, RELU6, RELU_N1_TO_1
this is adding restrictions according to the hardware |
We don't have a pattern matcher for a solitary Relay CLIP operation, so I suspect that if the graph consists only RELU and nothing else, it doesn't get offloaded to NPU at all. That's why I was suggesting it is tested in conjunction with another operator that matches an optional RELU, but I agree that there is no need to test it with all of them. But since both, RELU6 and RELU_N1_TO_1, are distinct TFLite operators, it would be good to have some testing for them to claim that they are supported.
To me this looks like fixing what was an oversight in the initial implementation of binary elementwise and is not related to adding support to two new activation functions (which is what the title of this PR claims) and in general should be a separate PR, but I don't think it worth the trouble to split up this PR since it is quite small. |
def minimum_clip_requantize_pattern() -> tvm.relay.dataflow_pattern.DFPattern: | ||
""" | ||
This function creates the pattern for minimum with fused RELU activation. | ||
""" | ||
pattern = is_op("minimum")(wildcard(), wildcard()) | ||
pattern = is_op("clip")(pattern) | ||
pattern = is_op("qnn.requantize")( | ||
pattern, is_constant(), is_constant(), is_constant(), is_constant() | ||
) | ||
return pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see how that works now :) I think so far the approach has been to match all the variations of the operator and then output the right combination of Ethos-U ops, e.g. Resize2d
. If there is a reason it can't be done that way, it would be good to document it there since it is not immediately obvious from the code why we need two patterns.
I think it also needs some legalization tests to check that:
(1) min/max with matching scales -> ethosu_binary_elementwise
(2) min/max with different scales -> ethosu_binary_elementwise
+ ethosu_identity
…naryElementwise operation
in this case requantize operation can be offloaded to NPU
reason that two patterns are used is that we can understand whether this pattern is suitable after checking scales match that are performed in function is_valid |
I left in this PR only adding tests for RELU6 and RELU_N1_TO_1 and I'm going to create another PR with changes for binary elementwise implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexey-yazev, I think patch is nearing to be ready, got one question about the operation.
op = tf.pad( | ||
x, | ||
[[0, 0], [padding[0], padding[2]], [padding[1], padding[3]], [0, 0]], | ||
"CONSTANT", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the preceding pad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no specific reason, it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexey-yazev, LGTM!
Tests are extended with cases with activations relu6 and relu_n1_to_1. Test cases contain conv2d operation + activation because separate activation is not offloaded to NPU.
Tests are extended with cases with activations relu6 and relu_n1_to_1. Test cases contain conv2d operation + activation because separate activation is not offloaded to NPU.
cc @leandron, @ekalda, @lhutton1