-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon][QNN] Improve performance wo QNN canonicalization #13734
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
@@ -124,7 +125,8 @@ class ScheduleRule : public runtime::ObjectRef { | |||
bool disallow_if_then_else, // | |||
bool require_injective, // | |||
bool require_ordered, // | |||
Optional<Array<String>> disallow_op); | |||
Optional<Array<String>> disallow_op, | |||
Optional<Array<String>> disallow_block = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is less stable to specify blocks by their names, instead, please consider using other approaches to suppress inline, for example, T.block_attr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... Interesting idea, I didn't think of it before. Need to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. now it can be disabled through the T.block_attr({"meta_schedule.inline_rule": "disable"})
src/meta_schedule/trace_apply.cc
Outdated
if (inst->outputs.size() == outputs.size()) { | ||
TranslateAddOutputRVs(inst->outputs, outputs, &rv_map); | ||
} else if (inst->outputs.size() < outputs.size()) { | ||
// We want to allow a trace generated for a single conv2d block to be applied to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetaSchedule requires structural stability - that is being said, it does not support the case where the number of children varies. The workaround is probably too hacky to get it work...Shall we consider other alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted all my changes here. But anyway this "hack" was introduced before my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is necessary since get_child_blocks
, get_consumers
etc can return different number of results depending on the post ops, even if the anchor op is the same.
This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization. Benchmarking of several models on Snapdragon 8gen1 and tuned with MS: shape | QNN canon enabled, ms | QNN canon disabled, ms | speedup | -----------------|-----------------------|------------------------|-------------| ResNet, int8 | 50 | 48 | +4.2% | Inception, int8 | 103 | 106 | -2.8% | SRGAN, int8 | 348 | 431 | -19.3% | --------------------------------------------------------------------------------| What was done: 1) Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize"). 2) Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the T.block_attr({"meta_schedule.inline_rule": "disable"}). 3) Implemented compute, alter op and legalization functions for qnn.conv2d operation (for Hexagon target).
6934947
to
5723ca1
Compare
Thanks for the update! No objection from me now :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting to see that we can directly tune QNN models and performance is reasonable or even better than canonicalization-enabled.
"""Legalize qnn.conv2d op for vrmpy tensorization. | ||
|
||
If the inputs are signed or unsigned int8 and data/kernel layouts are NCHW/OIHW, then the input | ||
and output channels are padded to be a multiple of 4 and 32 respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok not to legalize dtype as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree... dtype legalization will be the next step here.
) This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization. Benchmarking of several models on Snapdragon 8gen1 and tuned with MS: shape | QNN canon enabled, ms | QNN canon disabled, ms | speedup | -----------------|-----------------------|------------------------|-------------| ResNet, int8 | 50 | 48 | +4.2% | Inception, int8 | 103 | 106 | -2.8% | SRGAN, int8 | 348 | 431 | -19.3% | --------------------------------------------------------------------------------| What was done: 1) Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize"). 2) Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the T.block_attr({"meta_schedule.inline_rule": "disable"}). 3) Implemented compute, alter op and legalization functions for qnn.conv2d operation (for Hexagon target).
This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization.
Benchmarking of several models on Snapdragon 8gen1 and tuned with MS:
What was done:
Added 2 new passes:
QnnLegalize
andQnnCanonicalize
. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize").Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the
T.block_attr({"meta_schedule.inline_rule": "disable"}).
Implemented compute, alter op and legalization functions for
qnn.conv2d
operation (for Hexagon target).