Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Optimize Inference Performance on CPU #1035

Open
carter54 opened this issue Dec 4, 2019 · 6 comments
Open

Optimize Inference Performance on CPU #1035

carter54 opened this issue Dec 4, 2019 · 6 comments
Labels
enhancement New feature or request

Comments

@carter54
Copy link

carter54 commented Dec 4, 2019

Description

the news in https://github.com/dmlc/gluon-nlp/releases/tag/v0.8.1 shows BERT int8 quantization is presented in blog
https://medium.com/apache-mxnet/optimization-for-bert-inference-performance-on-cpu-3bb2413d376c
But the blog only shows some results of BERT quantization test,

The work on low precision deployment is still ongoing and involves un-released SW, the reproduction instructions will be available later.

When will this work be released and can we apply this quantization method on GPT2?

Thanks a lot for the great work!

@carter54 carter54 added the enhancement New feature or request label Dec 4, 2019
@leezu
Copy link
Contributor

leezu commented Dec 4, 2019

@TaoLv

@TaoLv
Copy link
Member

TaoLv commented Dec 9, 2019

Sorry for missing the message. We're working on cleaning the code and solution. Hope we can have a PR soon. I'm not familiar with the status of GPT2 in GluonNLP. Could you please point me to the scripts and whether it can be exported as a static model?

@leezu
Copy link
Contributor

leezu commented Dec 9, 2019

Yes, recently static GPT2 model is supported: #1010

@carter54
Copy link
Author

Thanks for the replies. @leezu @TaoLv
Looking forward to try int8 bert and gpt2 soon~

@TaoLv
Copy link
Member

TaoLv commented Feb 2, 2020

@carter54 FYI, here is the PR for BERT quantization: #1080

@carter54
Copy link
Author

carter54 commented Feb 13, 2020

@TaoLv Thx for the work, can this method be applied to GPT 2 model?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants