[Hexagon][QNN] Improve performance wo QNN canonicalization #13734

ibsidorenko · 2023-01-09T14:01:31Z

This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization.

Benchmarking of several models on Snapdragon 8gen1 and tuned with MS:

Model	QNN canon enabled, ms	QNN canon disabled, ms	speedup
ResNet, int8	50	48	+4.2%
Inception, int8	103	106	-2.8%
SRGAN, int8	348	431	-19.3%

What was done:

Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize").
Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the T.block_attr({"meta_schedule.inline_rule": "disable"}).
Implemented compute, alter op and legalization functions for qnn.conv2d operation (for Hexagon target).

tvm-bot · 2023-01-09T14:01:35Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Icemist, @mehrdadh, @quic-sanirudh _{See #10317 for details}

_{Generated by tvm-bot}

junrushao · 2023-01-10T07:37:45Z

include/tvm/meta_schedule/schedule_rule.h

@@ -124,7 +125,8 @@ class ScheduleRule : public runtime::ObjectRef {
                                         bool disallow_if_then_else,  //
                                         bool require_injective,      //
                                         bool require_ordered,        //
-                                         Optional<Array<String>> disallow_op);
+                                         Optional<Array<String>> disallow_op,
+                                         Optional<Array<String>> disallow_block = {});


It is less stable to specify blocks by their names, instead, please consider using other approaches to suppress inline, for example, T.block_attr

hmm... Interesting idea, I didn't think of it before. Need to check

Done. now it can be disabled through the T.block_attr({"meta_schedule.inline_rule": "disable"})

junrushao · 2023-01-10T07:38:51Z

src/meta_schedule/trace_apply.cc

+      if (inst->outputs.size() == outputs.size()) {
+        TranslateAddOutputRVs(inst->outputs, outputs, &rv_map);
+      } else if (inst->outputs.size() < outputs.size()) {
+        // We want to allow a trace generated for a single conv2d block to be applied to


MetaSchedule requires structural stability - that is being said, it does not support the case where the number of children varies. The workaround is probably too hacky to get it work...Shall we consider other alternatives?

I have reverted all my changes here. But anyway this "hack" was introduced before my changes.

Yeah, this is necessary since get_child_blocks, get_consumers etc can return different number of results depending on the post ops, even if the anchor op is the same.

This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization. Benchmarking of several models on Snapdragon 8gen1 and tuned with MS: shape | QNN canon enabled, ms | QNN canon disabled, ms | speedup | -----------------|-----------------------|------------------------|-------------| ResNet, int8 | 50 | 48 | +4.2% | Inception, int8 | 103 | 106 | -2.8% | SRGAN, int8 | 348 | 431 | -19.3% | --------------------------------------------------------------------------------| What was done: 1) Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize"). 2) Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the T.block_attr({"meta_schedule.inline_rule": "disable"}). 3) Implemented compute, alter op and legalization functions for qnn.conv2d operation (for Hexagon target).

ibsidorenko · 2023-01-10T19:01:11Z

cc @junrushao @masahi

Thanks for the update! No objection from me now :-)

masahi

Interesting to see that we can directly tune QNN models and performance is reasonable or even better than canonicalization-enabled.

masahi · 2023-01-10T23:07:40Z

python/tvm/relay/qnn/op/legalizations.py

+    """Legalize qnn.conv2d op for vrmpy tensorization.
+
+    If the inputs are signed or unsigned int8 and data/kernel layouts are NCHW/OIHW, then the input
+    and output channels are padded to be a multiple of 4 and 32 respectively.


Is it ok not to legalize dtype as well?

agree... dtype legalization will be the next step here.

) This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization. Benchmarking of several models on Snapdragon 8gen1 and tuned with MS: shape | QNN canon enabled, ms | QNN canon disabled, ms | speedup | -----------------|-----------------------|------------------------|-------------| ResNet, int8 | 50 | 48 | +4.2% | Inception, int8 | 103 | 106 | -2.8% | SRGAN, int8 | 348 | 431 | -19.3% | --------------------------------------------------------------------------------| What was done: 1) Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize"). 2) Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the T.block_attr({"meta_schedule.inline_rule": "disable"}). 3) Implemented compute, alter op and legalization functions for qnn.conv2d operation (for Hexagon target).

ibsidorenko marked this pull request as draft January 9, 2023 14:02

junrushao previously requested changes Jan 10, 2023

View reviewed changes

ibsidorenko force-pushed the hexagon-qnn-improve-perf branch from 6934947 to 5723ca1 Compare January 10, 2023 18:21

ibsidorenko changed the title ~~Do not review~~ [Hexagon][QNN] Improve performance wo QNN canonicalization Jan 10, 2023

ibsidorenko marked this pull request as ready for review January 10, 2023 18:30

masahi approved these changes Jan 10, 2023

View reviewed changes

masahi merged commit 15e185d into apache:main Jan 11, 2023

ibsidorenko deleted the hexagon-qnn-improve-perf branch March 29, 2023 06:23

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hexagon][QNN] Improve performance wo QNN canonicalization #13734

[Hexagon][QNN] Improve performance wo QNN canonicalization #13734

ibsidorenko commented Jan 9, 2023 •

edited

Loading

tvm-bot commented Jan 9, 2023 •

edited

Loading

junrushao Jan 10, 2023

ibsidorenko Jan 10, 2023

ibsidorenko Jan 10, 2023

junrushao Jan 10, 2023

ibsidorenko Jan 10, 2023

masahi Jan 10, 2023

ibsidorenko commented Jan 10, 2023

masahi left a comment

masahi Jan 10, 2023

ibsidorenko Jan 11, 2023

[Hexagon][QNN] Improve performance wo QNN canonicalization #13734

[Hexagon][QNN] Improve performance wo QNN canonicalization #13734

Conversation

ibsidorenko commented Jan 9, 2023 • edited Loading

tvm-bot commented Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibsidorenko commented Jan 10, 2023

masahi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibsidorenko commented Jan 9, 2023 •

edited

Loading

tvm-bot commented Jan 9, 2023 •

edited

Loading