-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ernie-3.0 mkldnn fp32 and int8 support #2468
Add ernie-3.0 mkldnn fp32 and int8 support #2468
Conversation
Hi, @yeliang2258 please review and merge this PR. Now with this PR, both fp32 and int8 works, without using save_quant_model.py. Thanks. |
Paddle: Ernie-3.0 FP32 mkldnn, 1 thread on ICX is 65.45 QPS
Ernie-3.0 INT8 mkldnn, 1 thread on ICX is 153.77 QPS
|
3650582
to
65a4fd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @ZeyuChen could you please merge this PR? @yeliang2258 has approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@lidanqing-intel Thanks for your contributions! |
PR types
Performance optimization
PR changes
Others
Description
Add ernie-3.0 fp32 and int8 mkldnn support. The Paddle need to be after Ernie-3.0 int8 fix #43297 [Bug fix] Do not quantize weights Y when matmul X and Y both other ops outputs.