[fix][relay][qnn] Bug fix for 8-bit quantized mul #14286
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When attempting to run inference with an 8-bit quantized version of EfficientNet (PyTorch implementation), I found that the quantization process crashed, which you can reproduce with this gist.
Upon closer inspection, I believe that the issue is related to the "Squeeze-and-Excitation block", where we multiply the output of a sigmoid with an earlier output.
Sample IR:
However this fails when we attempt to quantize, because the mul operation quantization operation does not cover this case (where
lhs_cond
is False, butrhs_cond
is True).I've updated the relevant files to cover this case, and with this fix the model can successfully compile.
Looking at the quantization code, this is not the only place where assumptions about LHS and RHS are being made.
However, I think it's only "general purpose" ops, like mul and add where we need to be agnostic.
Looking around, I don't see any obvious cases we aren't covering right now, but perhaps there are some tests that could be added.
Potential reviewers: @zhiics @jwfromm @anijain2305; listed as having quantization familiarity