Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U #13645

Merged

Conversation

Aleksei-grovety
Copy link
Contributor

@Aleksei-grovety Aleksei-grovety commented Dec 19, 2022

Tests are extended with cases with activations relu6 and relu_n1_to_1. Test cases contain conv2d operation + activation because separate activation is not offloaded to NPU.

cc @leandron, @ekalda, @lhutton1

@tvm-bot
Copy link
Collaborator

tvm-bot commented Dec 19, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@github-actions github-actions bot requested a review from leandron December 19, 2022 12:34
@Aleksei-grovety Aleksei-grovety force-pushed the add-relu6-and-relu_n1_to_1-to-ethosu branch from 2a4b651 to 26d7a78 Compare December 21, 2022 13:23
Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexey-yazev for the PR! Looks all good in terms of functionality, my only concern is that this PR adds more than 800 FVP based codegen tests that are notoriously slow and would probably make upstream CI take significantly longer. The RELUs are all quite simple activation functions, so I don't think we get much benefit from testing them with each setting of shape, padding, dilation etc of e.g. conv2d or depthwise2d. I think it would be better to have separate codegen tests for the activation functions that operate on preceding ops with fixed attributes.

Comment on lines 1136 to 1140
def relu_n1_to_1(x):
"""
The specific pattern will be replaced into RELU_N1_TO_1 by tflite.
"""
return tf.math.maximum(-1.0, tf.math.minimum(x, 1.0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, what Relay does this get lowered to? I always thought all the different kinds of RELUs were lowered to the same Relay clip with just different min and max and then this clip was pattern matched as a part of all the different op pattern matchers and added to the primitive op's clip_min and clip_max, but as we don't pattern match for standalone clip and this test here passes, I suppose it gets lowered to something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All relu operations are converted to clip operations in tflite frontend. It seems these tests are unnecessary perhaps it's worth adding a test for maximum operation when there are different scales and it's uploaded to the npu not by one operation as in other cases, but by two.

Tests are extended with cases with activations relu6 and relu_n1_to_1.
Does not fuse min and max operations with requantize if there are different scales as it is not supported on NPU.
Remain relu_n1_to_1 for testing case when activation cannot be fused with the previous operation
@Aleksei-grovety Aleksei-grovety force-pushed the add-relu6-and-relu_n1_to_1-to-ethosu branch from 007e2cb to f692260 Compare January 9, 2023 08:23
Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this, I took another look and I'm still quite confused about what is going on, so maybe you can elaborate more in the commit body? Two things:
(1) We should still check whether TFLite's RELU6 is correctly executed on NPU, some basic conv2d + RELU6 tests should do
(2) I'm confused about the min/max changes - is it an unrelated bugfix or necessary for RELU_N1_TO_1 implementation?

Comment on lines 945 to 954
def minimum_clip_requantize_pattern() -> tvm.relay.dataflow_pattern.DFPattern:
"""
This function creates the pattern for minimum with fused RELU activation.
"""
pattern = is_op("minimum")(wildcard(), wildcard())
pattern = is_op("clip")(pattern)
pattern = is_op("qnn.requantize")(
pattern, is_constant(), is_constant(), is_constant(), is_constant()
)
return pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind having two patterns instead of one with optional requantize if we use same params extractor (MinParams) with them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two patterns to support two cases:
a) when we can offload minimum + clip + qnn.requantize to NPU with one operation ethosu_binary_elementwise if there are same scales
b) when we can offload minimum + clip + qnn.requantize to NPU with two operations ethosu_binary_elementwise + ethosu_identity if there are different scales

Since there is a hardware limitation, we cannot perform min or max operation fused with requantize (please look at NPU_SET_OFM_SCALE https://developer.arm.com/documentation/102420/0200/Programmers-model/Command-stream/cmd1-commands-) when we have different scales.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see how that works now :) I think so far the approach has been to match all the variations of the operator and then output the right combination of Ethos-U ops, e.g. Resize2d. If there is a reason it can't be done that way, it would be good to document it there since it is not immediately obvious from the code why we need two patterns.
I think it also needs some legalization tests to check that:
(1) min/max with matching scales -> ethosu_binary_elementwise
(2) min/max with different scales -> ethosu_binary_elementwise + ethosu_identity

@Aleksei-grovety
Copy link
Contributor Author

(1) We should still check whether TFLite's RELU6 is correctly executed on NPU, some basic conv2d + RELU6 tests should do

in my opinion it will be enough to add separate tests for RELU6 and RELU_N1_TO_1 since RELU, RELU6, RELU_N1_TO_1
activations are converted to clip operation in the tflite frontend https://github.com/apache/tvm/blob/main/python/tvm/relay/frontend/tflite.py#L534 and other cases with basic operations will be covered by cases with relu activation or is it necessary to add tests for RELU6, RELU_N1_TO_1 with other operations?

(2) I'm confused about the min/max changes - is it an unrelated bugfix or necessary for RELU_N1_TO_1 implementation?

this is adding restrictions according to the hardware

@ekalda
Copy link
Contributor

ekalda commented Jan 10, 2023

(1) We should still check whether TFLite's RELU6 is correctly executed on NPU, some basic conv2d + RELU6 tests should do

in my opinion it will be enough to add separate tests for RELU6 and RELU_N1_TO_1 since RELU, RELU6, RELU_N1_TO_1 activations are converted to clip operation in the tflite frontend https://github.com/apache/tvm/blob/main/python/tvm/relay/frontend/tflite.py#L534 and other cases with basic operations will be covered by cases with relu activation or is it necessary to add tests for RELU6, RELU_N1_TO_1 with other operations?

We don't have a pattern matcher for a solitary Relay CLIP operation, so I suspect that if the graph consists only RELU and nothing else, it doesn't get offloaded to NPU at all. That's why I was suggesting it is tested in conjunction with another operator that matches an optional RELU, but I agree that there is no need to test it with all of them. But since both, RELU6 and RELU_N1_TO_1, are distinct TFLite operators, it would be good to have some testing for them to claim that they are supported.

(2) I'm confused about the min/max changes - is it an unrelated bugfix or necessary for RELU_N1_TO_1 implementation?

this is adding restrictions according to the hardware

To me this looks like fixing what was an oversight in the initial implementation of binary elementwise and is not related to adding support to two new activation functions (which is what the title of this PR claims) and in general should be a separate PR, but I don't think it worth the trouble to split up this PR since it is quite small.

Comment on lines 945 to 954
def minimum_clip_requantize_pattern() -> tvm.relay.dataflow_pattern.DFPattern:
"""
This function creates the pattern for minimum with fused RELU activation.
"""
pattern = is_op("minimum")(wildcard(), wildcard())
pattern = is_op("clip")(pattern)
pattern = is_op("qnn.requantize")(
pattern, is_constant(), is_constant(), is_constant(), is_constant()
)
return pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see how that works now :) I think so far the approach has been to match all the variations of the operator and then output the right combination of Ethos-U ops, e.g. Resize2d. If there is a reason it can't be done that way, it would be good to document it there since it is not immediately obvious from the code why we need two patterns.
I think it also needs some legalization tests to check that:
(1) min/max with matching scales -> ethosu_binary_elementwise
(2) min/max with different scales -> ethosu_binary_elementwise + ethosu_identity

@Aleksei-grovety
Copy link
Contributor Author

so I suspect that if the graph consists only RELU and nothing else, it doesn't get offloaded to NPU at all.

in this case requantize operation can be offloaded to NPU

I think so far the approach has been to match all the variations of the operator and then output the right combination of Ethos-U ops, e.g. Resize2d. If there is a reason it can't be done that way, it would be good to document it there since it is not immediately obvious from the code why we need two patterns.

reason that two patterns are used is that we can understand whether this pattern is suitable after checking scales match that are performed in function is_valid

@Aleksei-grovety
Copy link
Contributor Author

Aleksei-grovety commented Jan 11, 2023

To me this looks like fixing what was an oversight in the initial implementation of binary elementwise and is not related to adding support to two new activation functions (which is what the title of this PR claims) and in general should be a separate PR, but I don't think it worth the trouble to split up this PR since it is quite small.

I left in this PR only adding tests for RELU6 and RELU_N1_TO_1 and I'm going to create another PR with changes for binary elementwise implementation

Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexey-yazev, I think patch is nearing to be ready, got one question about the operation.

Comment on lines 1124 to 1128
op = tf.pad(
x,
[[0, 0], [padding[0], padding[2]], [padding[1], padding[3]], [0, 0]],
"CONSTANT",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the preceding pad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no specific reason, it can be removed.

Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexey-yazev, LGTM!

@ekalda ekalda merged commit e0c2181 into apache:main Jan 12, 2023
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
Tests are extended with cases with activations relu6 and relu_n1_to_1. Test cases contain conv2d operation + activation because separate activation is not offloaded to NPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants