-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model compression API #2777
Add model compression API #2777
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave some comments
AutoModelForQuestionAnswering, | ||
) | ||
from compress_trainer import CompressConfig, PTQConfig | ||
from paddlenlp.transformers import AutoTokenizer, AutoModelForQuestionAnswering | ||
from paddlenlp.utils.log import logger | ||
from datasets import load_metric, load_dataset | ||
|
||
sys.path.append("../ernie-1.0/finetune") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么这里会有 ernie-1.0 路径?
DataArguments, | ||
ModelArguments, | ||
) | ||
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有个疑问,为什么类似 QuestionAnsweringTrainer
这种下游任务的 Trainer 实现没有进框架,而是放在 model_zoo/ernie-1.0 目录下呢?@wawltor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以考虑进入框架,这里没有放的主要原因是 参考 的 huggingface。
当时可能的考虑有:
- 给出了一些用户改造代码Trainer的示例。
- 是否有很强的通用性。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_zoo/ernie-3.0/compress_qa.py
Outdated
) | ||
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features | ||
from utils import ALL_DATASETS, DataArguments, ModelArguments | ||
from compress_trainer import AutoCompressConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照之前的讨论,我理解 compress_trainer.py
脚本里的实现应该是要进 paddlenlp
框架的,不需要在每个场景下都维护 compress_trainer.py
这个脚本?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经放到paddlenlp下,并且现在paddlenlp/transformers/
下还新增了ofa_utils.py
这个文件。这个文件是基本copy自 https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/nas/ofa/utils/nlp_utils.py
之所以几乎copy一份到paddlenlp是因为paddleslim该文件末尾有这几行代码,
nn.MultiHeadAttention.forward = _mha_forward
nn.MultiHeadAttention._prepare_qkv = _prepare_qkv
nn.TransformerEncoder.forward = _encoder_forward
nn.TransformerEncoderLayer.forward = _encoder_layer_forward
这种写法不是我们推荐的,首先是一旦import这个文件nlp_utils.py,以上类的forward函数均会改变,并且paddleslim没有恢复的接口,可能导致未知错误;其次改变类的forward也会让我们现在model_outputs功能可能会cover不住这些patch,发生错误。第二点是来自和 @guoshengCS 的讨论,定位到的原因。
model_zoo/ernie-3.0/compress_qa.py
Outdated
) | ||
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features | ||
from utils import ALL_DATASETS, DataArguments, ModelArguments | ||
from compress_trainer import AutoCompressConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用户视角来看,感觉 AutoCompressConfig
命名改成 CompressConfig
更简洁一些? Auto 更多是从 RD 视角看起来和 AutoModel 惯用法保持一致?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢指出,因为考虑这里的Auto和AutoModel用法有区别,还是先去掉Auto了
bb0fa60
to
ec09090
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave some comments
data_collator=data_collator, | ||
train_dataset=train_dataset, | ||
eval_dataset=eval_dataset, | ||
criterion=criterion) # Stratedy`dynabert` needs arguments `criterion` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Stratedy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
|
||
# Example 2: ptq | ||
# configs = AutoCompressConfig("ptq") | ||
# configs.set_config(width_mult_list=[0.75, 2 / 3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单独做 PTQ 量化的时候为什么需要配置 width_mult_list 参数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a typo. Thanks. Done.
# Supports 'dynabert+ptq', 'dynabert' and 'ptq' now. | ||
# Example 1: dynabert+ptq | ||
configs = AutoCompressConfig() | ||
configs.set_config(width_mult_list=[0.75, 2 / 3], batch_size_list=[4, 8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么不需要配置 input_dir 参数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
压缩API的输入是动态图模型,已经通过Trainer中有已有的参数model_name_or_path
传进来了
# configs = AutoCompressConfig("ptq") | ||
# configs.set_config(width_mult_list=[0.75, 2 / 3], | ||
# batch_size_list=[4, 8], | ||
# input_dir=os.path.join(model_args.model_name_or_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_dir 参数是一个策略无关参数,抽出来放入 trainer.compress() 接口和 output_dir 参数对应起来是否更容易理解?要么就是把 output_dir 参数也加入到 config 中, trainer.compress() 接口只接受 1 个 config 参数,看看哪种选择更合理?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
再次确认了下,input_dir
可以不需要,即便是用户只做量化。压缩 API 的输入模型是动态图模型,模型路径已经通过 Trainer API 的model_name_or_path
传入了。
# "compress", str(2/3))) | ||
|
||
# Example 3: dynabert | ||
# configs = AutoCompressConfig("dynabert") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不需要显式设置 "dynabert" 策略的 config 么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果用户没设置就是按默认的来,程序启动之后会提示有哪些参数可以自定义设置,并打印出运行时最终的config。
else: | ||
pass | ||
|
||
self.stratedy = stratedy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks:-)
|
||
self.stratedy = stratedy | ||
self.config_dict = {} | ||
for each_stratedy in stratedy.split("+"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy 参数定义成 str 的话就没法避免使用 +
号来解析,如果把 strategy 定义成列表如何?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里想把'dynabert+ptq'
搞成一个固定方案,想打包成一个和'dynabert'
, 'ptq'
并列的策略。
paddle.version.commit)) | ||
for strategy in self.config_dict: | ||
logger.info('{}:'.format(strategy)) | ||
for a in self.config_dict[strategy]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 对 dict 的迭代这里为什么不采用
for config_key, config_value in self.config_dict[strategy].items()
- 后续可以注意下变量命名规范,避免采用
a
这种表意不清楚的变量名
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,感谢提醒
output_dir_width = os.path.join(output_dir, str(width_mult)) | ||
self.quant(output_dir_width, output_dir_width, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quant 的接口需要 2 个 output_dir_width 参数?意思是量化后模型的输出路径依然是 output_dir_width 么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,第一个参数是输入路径,第二个参数是输出路径。这里是裁剪后的量化,前者是从用户输入的output_dir
和裁剪宽度创建的路径,量化就也放在改目录下的子目录中。例如用户传的output_dir
是best_models/CLUEWSC2020/compress
,那么裁剪后的模型会在:
best_models/CLUEWSC2020/compress/width_mult_0.75/float32
,量化后的模型
best_models/CLUEWSC2020/compress/width_mult_0.75/hist16/int8.pdmodel
@@ -763,5 +733,4 @@ def soft_cross_entropy(inp, target): | |||
|
|||
|
|||
Trainer.compress = compress | |||
Trainer.prune = prune | |||
Trainer.quant = quant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么 prune 删除,quant 保留?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为不确定之后如果要接入别的裁剪方法的话,和dynabert合并能不能节省代码,但是可能量化可能可以。这里也不打算对外开放这quant
这个接口
ec09090
to
bfc3723
Compare
bfc3723
to
1e4f76c
Compare
1e4f76c
to
3500071
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Description Usage 部分看起来没有修改?
# Calling `set_config` is not necessary | ||
# configs.set_config(batch_size_list=[4, 8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第1句提示不需要 set_config, 第2句又设置 config ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实想告诉可以这样设也可以不设,这里也是注释掉了,代码不会跑。可能更明确的用法需要在文档中体现。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实想告诉可以这样设也可以不设,这里也是注释掉了,代码不会跑。可能更明确的用法需要在文档中体现。
嗯,可以用注释传达清楚你的本意,避免别人误解或者引起迷惑。
08c0783
to
0bb8600
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave some comments
ModelArguments, | ||
) | ||
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features | ||
from utils import ALL_DATASETS, DataArguments, ModelArguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有个疑问哈: 为什么这3个数据类型 ALL_DATASETS, DataArguments, ModelArguments
没有进 Trainer 框架,而是放在 ernie1.0/finetune/utils
里? @ZHUI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些是 custom 用户自定义的东西。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
跟数据,任务类型关系比较大。这里应该是 ernie-3.0 和 ernie-1.0 任务比较相似,所以共用。但是对于其他模型来讲,可能不一定适用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,清楚了。
if data_args.dataset in ALL_DATASETS: | ||
# if you custom you hyper-parameters in yaml config, it will overwrite all args. | ||
config = ALL_DATASETS[data_args.dataset] | ||
for args in (model_args, data_args, training_args): | ||
for args in (model_args, data_args, compression_args): | ||
for arg in vars(args): | ||
if arg in config.keys(): | ||
setattr(args, arg, config[arg]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一段逻辑能否加上注释说明?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第43行有注释,如果有自定义的yaml文件,则会替换args传递来的参数;
# We use this argument because the texts in our dataset are lists of words (with a label for each word). | ||
is_split_into_words=True, | ||
return_length=True) | ||
label_ids = example['ner_tags'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ner_tags
这是对不同的 NER 任务都通用么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不一定,这取决于数据集本身的结构,msra_ner和conll2002数据集都有ner_tags这个key,这个例子应该也只是以msra_ner为例
""" | ||
Supports DynaBERT strategy now. | ||
Supports pruning dynabert and post-training quantization. If both are | ||
needed, pruning dynabertwould be performed before quantizaton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- dynabert -> would
- 代码注释里的 dynabert 是否统一按照原论文专有名词来表述 DynaBERT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢指出,全部改为原文DynaBERT
0bb8600
to
986bcce
Compare
986bcce
to
1ec2ca0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Function optimization
PR changes
APIs
Description
Update compression API
Done:
--
的形式传参,或者用Yaml文件传,不需要区分training和compression,直接把要传的参数传进去即可。TODO:
Usage:
如果使用conf.yaml,需要把
配进去,也可以使用命令行的形式: