Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon][QNN] Improve performance wo QNN canonicalization #13734

Merged
merged 1 commit into from
Jan 11, 2023

Conversation

ibsidorenko
Copy link
Contributor

@ibsidorenko ibsidorenko commented Jan 9, 2023

This commit improves performance of different models tuned with MetaScheduler for Hexagon target and without QNN canonicalization.

Benchmarking of several models on Snapdragon 8gen1 and tuned with MS:

Model QNN canon enabled, ms QNN canon disabled, ms speedup
ResNet, int8 50 48 +4.2%
Inception, int8 103 106 -2.8%
SRGAN, int8 348 431 -19.3%

What was done:

  1. Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just wrappers for Legalize("FTVMQnnLegalize") and Legalize("FTVMQnnCanonicalize").

  2. Added ability to disable inline for specific blocks in MetaSchedule AutoInline rule. For example, it can be done through the T.block_attr({"meta_schedule.inline_rule": "disable"}).

  3. Implemented compute, alter op and legalization functions for qnn.conv2d operation (for Hexagon target).

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 9, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@ibsidorenko ibsidorenko marked this pull request as draft January 9, 2023 14:02
@@ -124,7 +125,8 @@ class ScheduleRule : public runtime::ObjectRef {
bool disallow_if_then_else, //
bool require_injective, //
bool require_ordered, //
Optional<Array<String>> disallow_op);
Optional<Array<String>> disallow_op,
Optional<Array<String>> disallow_block = {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is less stable to specify blocks by their names, instead, please consider using other approaches to suppress inline, for example, T.block_attr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... Interesting idea, I didn't think of it before. Need to check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. now it can be disabled through the T.block_attr({"meta_schedule.inline_rule": "disable"})

if (inst->outputs.size() == outputs.size()) {
TranslateAddOutputRVs(inst->outputs, outputs, &rv_map);
} else if (inst->outputs.size() < outputs.size()) {
// We want to allow a trace generated for a single conv2d block to be applied to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetaSchedule requires structural stability - that is being said, it does not support the case where the number of children varies. The workaround is probably too hacky to get it work...Shall we consider other alternatives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted all my changes here. But anyway this "hack" was introduced before my changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is necessary since get_child_blocks, get_consumers etc can return different number of results depending on the post ops, even if the anchor op is the same.

This commit improves performance of different models tuned with
MetaScheduler for Hexagon target and without QNN canonicalization.

Benchmarking of several models on Snapdragon 8gen1 and tuned with MS:

shape            | QNN canon enabled, ms | QNN canon disabled, ms |   speedup   |
-----------------|-----------------------|------------------------|-------------|
ResNet, int8     |           50          |           48           |    +4.2%    |
Inception, int8  |          103          |          106           |    -2.8%    |
SRGAN, int8      |          348          |          431           |   -19.3%    |
--------------------------------------------------------------------------------|

What was done:

1) Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just
wrappers for Legalize("FTVMQnnLegalize") and
Legalize("FTVMQnnCanonicalize").

2) Added ability to disable inline for specific blocks in MetaSchedule
AutoInline rule. For example, it can be done through the
T.block_attr({"meta_schedule.inline_rule": "disable"}).

3) Implemented compute, alter op and legalization functions for
qnn.conv2d operation (for Hexagon target).
@ibsidorenko ibsidorenko force-pushed the hexagon-qnn-improve-perf branch from 6934947 to 5723ca1 Compare January 10, 2023 18:21
@ibsidorenko ibsidorenko changed the title Do not review [Hexagon][QNN] Improve performance wo QNN canonicalization Jan 10, 2023
@ibsidorenko ibsidorenko marked this pull request as ready for review January 10, 2023 18:30
@ibsidorenko
Copy link
Contributor Author

cc @junrushao @masahi

@junrushao junrushao dismissed their stale review January 10, 2023 20:51

Thanks for the update! No objection from me now :-)

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see that we can directly tune QNN models and performance is reasonable or even better than canonicalization-enabled.

"""Legalize qnn.conv2d op for vrmpy tensorization.

If the inputs are signed or unsigned int8 and data/kernel layouts are NCHW/OIHW, then the input
and output channels are padded to be a multiple of 4 and 32 respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok not to legalize dtype as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree... dtype legalization will be the next step here.

@masahi masahi merged commit 15e185d into apache:main Jan 11, 2023
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
)

This commit improves performance of different models tuned with
MetaScheduler for Hexagon target and without QNN canonicalization.

Benchmarking of several models on Snapdragon 8gen1 and tuned with MS:

shape            | QNN canon enabled, ms | QNN canon disabled, ms |   speedup   |
-----------------|-----------------------|------------------------|-------------|
ResNet, int8     |           50          |           48           |    +4.2%    |
Inception, int8  |          103          |          106           |    -2.8%    |
SRGAN, int8      |          348          |          431           |   -19.3%    |
--------------------------------------------------------------------------------|

What was done:

1) Added 2 new passes: QnnLegalize and QnnCanonicalize. But this is just
wrappers for Legalize("FTVMQnnLegalize") and
Legalize("FTVMQnnCanonicalize").

2) Added ability to disable inline for specific blocks in MetaSchedule
AutoInline rule. For example, it can be done through the
T.block_attr({"meta_schedule.inline_rule": "disable"}).

3) Implemented compute, alter op and legalization functions for
qnn.conv2d operation (for Hexagon target).
@ibsidorenko ibsidorenko deleted the hexagon-qnn-improve-perf branch March 29, 2023 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants