text classification bug fix & support ernie m #3184

lugimzzz · 2022-09-02T10:48:24Z

PR types

Others

PR changes

Others

Description

修复了文本分类中的bug，新增支持ernie-m模型（裁剪和部署部分暂不支持）

wawltor · 2022-09-05T02:47:24Z

applications/text_classification/hierarchical/README.md

@@ -376,7 +384,7 @@ python prune.py \
 * `per_device_eval_batch_size`：开发集评测过程批处理大小，请结合显存情况进行调整，若出现显存不足，请适当调低这一参数；默认为32。
 * `learning_rate`：训练最大学习率；默认为3e-5。
 * `num_train_epochs`: 训练轮次，使用早停法时可以选择100；默认为10。
-* `logging_steps`: 训练过程中日志打印的间隔steps数，默认5。
+* `logging_steps`: 训练过程中日志打印的间隔steps数，默认100。


这里的logging steps对于大多数cpu用户来说，是不是太大了

这是因为trainer里默认的参数就是100，但我在训练时命令行的参数还是设置--logging_steps 5

wawltor · 2022-09-05T02:48:11Z

applications/text_classification/hierarchical/analysis/evaluate.py

-            'token_type_ids'], batch['labels']
-        logits = model(input_ids, token_type_ids)
+        label = batch.pop("labels")
+        logits = model(**batch)


这里不展开的目的是什么了？

为了可以适配erniem模型，erniem模型tokenizer得到的数据和模型输入都没有token_type_ids

wawltor · 2022-09-05T02:48:43Z

applications/text_classification/hierarchical/analysis/evaluate.py

-            'token_type_ids'], batch['labels']
-        logits = model(input_ids, token_type_ids)
+        label = batch.pop("labels")
+        logits = model(**batch)


wawltor · 2022-09-05T02:51:55Z

applications/text_classification/multi_class/README.md

@@ -264,16 +267,15 @@ checkpoint/

 * 如需恢复模型训练，则可以设置 `init_from_ckpt` ， 如 `init_from_ckpt=checkpoint/model_state.pdparams` 。
 * 如需训练英文文本分类任务，只需更换预训练模型参数 `model_name` 。英文训练任务推荐使用"ernie-2.0-base-en"，更多可选模型可参考[Transformer预训练模型](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/index.html#transformer)。
-
+* 英文和中文以外文本分类任务建议使用多语言预训练模型"ernie-m-base","ernie-m-large"， 多语言模型暂不支持文本分类模型部署，相关功能正在加速开发中。


ernie-m版本目前在适配压缩API，主要问题是什么了？

erniem模型tokenizer得到的数据和模型输入都没有token_type_ids，但压缩API训练的时候默认模型输入要有token_type_ids

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/trainer/trainer_compress.py#L386

wawltor

LGTM

bug_fix

4613068

lugimzzz requested a review from wawltor September 2, 2022 10:54

lugimzzz self-assigned this Sep 2, 2022

lugimzzz added bugfix text classification labels Sep 2, 2022

wawltor reviewed Sep 5, 2022

View reviewed changes

wawltor approved these changes Sep 5, 2022

View reviewed changes

lugimzzz added 2 commits September 5, 2022 11:15

Merge branch 'develop' into erniem

6e88119

Merge branch 'develop' into erniem

d899b18

lugimzzz merged commit ab2bd21 into PaddlePaddle:develop Sep 5, 2022

lugimzzz deleted the erniem branch September 5, 2022 07:16

lugimzzz mentioned this pull request Sep 5, 2022

PaddleNLP 2.4.0 Release Note Candidate #3190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text classification bug fix & support ernie m #3184

text classification bug fix & support ernie m #3184

lugimzzz commented Sep 2, 2022 •

edited

Loading

wawltor Sep 5, 2022

lugimzzz Sep 5, 2022

wawltor Sep 5, 2022

lugimzzz Sep 5, 2022

wawltor Sep 5, 2022

lugimzzz Sep 5, 2022

wawltor Sep 5, 2022

lugimzzz Sep 5, 2022

wawltor left a comment

text classification bug fix & support ernie m #3184

text classification bug fix & support ernie m #3184

Conversation

lugimzzz commented Sep 2, 2022 • edited Loading

PR types

PR changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

lugimzzz commented Sep 2, 2022 •

edited

Loading