-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add map_matmul and fc_act_fuse passes to quant2_int8_mkldnn_pass #38023
add map_matmul and fc_act_fuse passes to quant2_int8_mkldnn_pass #38023
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@baoachun @Aganlengzi Could you please merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…dlePaddle#38023) * add map_matmul passes to quant2_int8_mkldnn_pass * fix fc+act fuse (activation scale) * ci fix, c++17 structured bindings not available * fix ci static check
PR types
Performance optimization
PR changes
Others
Describe
Fixed mkldnn fc with activation.
Also by adding map_matmul passes, all matmul_v2 ops are converted to either matmul or mul. Afterwards mul+elementiwse_add are fused to fc by fc_fuse_pass.
Adding those 3 map passes fix a problem of crashing introduced by putting matmul_v2_transpose_reshape_fuse_pass in this file. They improve performance. Those passes together with matmul_v2_transpose_reshape fuse pass change accuracy from acc: 0.5478 to 0.5383, but accuracy before this change was higher than reference obtained from tnewst_quant_model.
I also added fc_act_mkldnn_fuse pass which fuses fc+gelu. It improves performance.
Accuracy written in here: #36962 (comment)