Add unit tests for generation models #3018

FrostML · 2022-08-11T06:38:44Z

PR types

Others

PR changes

Others

Description

Add unit tests for generation models.

Done:

Unit tests:

Generation model unit tests framework
Generate API unit tests
BART unit tests
Update test_tokenizer_common.py to support more from_pretrained_filter

Functions:

Support logits_processor is None
Support custom logits_processor

TODO:

T5
GPT
UnifiedTransformer
UNIMO
CodeGen
mBART

…nto unittests

guoshengCS · 2022-08-19T06:23:05Z

paddlenlp/transformers/bart/tokenizer.py

+                                            unk_token=unk_token,
+                                            pad_token=pad_token,
+                                            mask_token=mask_token,
+                                            **kwargs)


170--813 这段不用加，会有hook处理

guoshengCS · 2022-08-19T06:23:47Z

paddlenlp/transformers/gpt/tokenizer.py

+                                                  bos_token=bos_token,
+                                                  eos_token=eos_token,
+                                                  eol_token=eol_token,
+                                                  **kwargs)


同上，这里也不用添加

… unittests

guoshengCS · 2022-08-22T02:49:35Z

paddlenlp/transformers/bart/tokenizer.py

+import json
+import jieba
+import shutil
+import sentencepiece as spm


这里是否需要jieba和sentencepiece

guoshengCS · 2022-08-22T03:19:55Z

tests/transformers/bart/test_modeling.py

+        generated_summaries = tok.batch_decode(
+            hypotheses_batch.tolist(),
+            clean_up_tokenization_spaces=True,
+            skip_special_tokens=True)


这里是否要对生成的结果来check下，看HF的会有这个的判断

这里是因为 bart-large 生成出来的结果是乱的，无意义的句子，因此暂时去掉了 assert，待单测全部搞完之后，需要对模型权重本身做验证

guoshengCS · 2022-08-22T05:42:48Z

paddlenlp/transformers/generation_utils.py

+
+        decoder_start_token_id = (
+            decoder_start_token_id if decoder_start_token_id is not None else
+            getattr(self, pretrained_model_name).config.get(


pretrained_model_name是从model获取的吗，这个又是在哪里设置的呢，感觉作为model的attr不太合适

guoshengCS · 2022-08-22T05:56:36Z

tests/transformers/test_generation_utils.py

+            input_ids_clone = input_ids.repeat_interleave(beam_scorer.num_beams,
+                                                          axis=0)
+
+        kwargs["use_cache"] = True


上面_sample_generate那些不需要设置这个吗

这里和 hf 对齐设置，有的单测跑 use_cache 为 True，而有的为 False，能同时兼顾到不同配置的情况

… unittests

FrostML added 6 commits August 11, 2022 06:32

unittests

651117e

update

3c1f4e0

update

0d9bbc6

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

1b544b8

…nto unittests

update bart

3b4d554

bart tokenizer

54f316b

guoshengCS reviewed Aug 19, 2022

View reviewed changes

FrostML added 2 commits August 19, 2022 06:31

fix

50cd17c

resolve conflict

7221a9f

FrostML marked this pull request as ready for review August 19, 2022 06:36

update

90ba0ba

FrostML requested a review from guoshengCS August 19, 2022 07:17

FrostML assigned guoshengCS Aug 19, 2022

FrostML added 8 commits August 19, 2022 07:35

update

0be9411

update

52f137c

bart test

f6e755c

Merge branch 'develop' into unittests

0e5a6a1

Merge branch 'develop' into unittests

e779986

delete timeout_decorator

9d37831

Merge branch 'unittests' of https://github.com/FrostML/PaddleNLP into…

e38b479

… unittests

Merge branch 'develop' into unittests

7b3df1c

guoshengCS reviewed Aug 22, 2022

View reviewed changes

FrostML added 5 commits August 22, 2022 07:40

update

f6fda31

Merge branch 'unittests' of https://github.com/FrostML/PaddleNLP into…

b611223

… unittests

comments

44f24c4

delete spm

45fcc13

delete jieba

674e5c8

guoshengCS approved these changes Aug 22, 2022

View reviewed changes

FrostML added 2 commits August 22, 2022 16:28

Merge branch 'develop' into unittests

37a5948

Merge branch 'develop' into unittests

3176f16

Merge branch 'develop' into unittests

a3ecbb0

FrostML merged commit 08636b3 into PaddlePaddle:develop Aug 22, 2022

FrostML mentioned this pull request Aug 24, 2022

PaddleNLP 2.3.6 Release Note Candidate #3122

Closed

FrostML mentioned this pull request Sep 5, 2022

PaddleNLP 2.4.0 Release Note Candidate #3190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit tests for generation models #3018

Add unit tests for generation models #3018

FrostML commented Aug 11, 2022 •

edited

Loading

guoshengCS Aug 19, 2022

FrostML Aug 19, 2022

guoshengCS Aug 19, 2022

FrostML Aug 19, 2022

guoshengCS Aug 22, 2022

FrostML Aug 22, 2022

guoshengCS Aug 22, 2022

FrostML Aug 22, 2022

guoshengCS Aug 22, 2022

FrostML Aug 22, 2022

guoshengCS Aug 22, 2022

FrostML Aug 22, 2022 •

edited

Loading

Add unit tests for generation models #3018

Add unit tests for generation models #3018

Conversation

FrostML commented Aug 11, 2022 • edited Loading

PR types

PR changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrostML Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

FrostML commented Aug 11, 2022 •

edited

Loading

FrostML Aug 22, 2022 •

edited

Loading