Pre-trained language models (PLMs) have become an important technique in natural language processing. In the past two years, the Joint Laboratory of HIT and iFLYTEK Research (HFL) has released a variety of Chinese pre-training model resources and related toolkits. As a continuation of related work, in this project, we propose a pre-trained model (PERT) based on permuted language model (PerLM) to learn text semantic information in a self-supervised manner without introducing the mask tokens [MASK]. The experimental results show that PERT yields both positive and negative performance on a wide range of Chinese and English NLU tasks. We release Chinese and English PERT (base-level and large-level) to our community.

PERT: Pre-Training BERT with Permuted Language Model
Yiming Cui, Ziqing Yang, Ting Liu

View more resources released by HFL: https://github.com/ymcui/HFL-Anthology

News

Mar 28, 2023 We open-sourced Chinese LLaMA&Alpaca LLMs, which can be quickly deployed on PC. Check: https://github.com/ymcui/Chinese-LLaMA-Alpaca

2022/10/29 We release a new pre-trained model called LERT, check https://github.com/ymcui/LERT/

May 17, 2022 We release the PERT models that were finetuned on machine reading comprehension data with interactive demos, check: Download

Mar 15, 2022 Our preliminary technical report is available on arXiv: https://arxiv.org/abs/2203.06906

Feb 24, 2022 Chinese and English PERT-base and PERT-large have been released. The BERT structure can be directly loaded and fine-tuned for downstream tasks. The technical report will be issued after it is perfected. The time is expected to be in mid-March. Thank you for your patience.

Feb 17, 2022 Thank you for your attention to this project. It is expected that the model will be issued next week, and the technical report will be issued after it is improved.

Introduction

The learning of pre-trained models for natural language understanding (NLU) falls broadly into two categories: input text with or without the masking token [MASK].

The main motivation of this work is quite interesting, which is based on a usual phenomenon: a certain degree of permuted text does not affect comprehension. So is it possible to learn semantic knowledge from the permuted text?

General idea: PERT utilizes permuted text as the input (so no [MASK] tokens are introduced). The learning objective of PERT is to predict the location of the original token. Please take a look at the following example.

Download

Original Download (TF version)

The model weights of TensorFlow 1.15 are mainly provided here. For models in PyTorch or TensorFlow2, see the next section.

The open source version only contains the weights of the Transformer part, which can be directly used for fine-tuning of downstream tasks. Also you can further pre-train this model with any pre-training objective as long as it uses traditional transformer architecture as the main body. For more instructions, see FAQ.

PERT-large: 24-layer, 1024-hidden, 16-heads, 330M parameters
PERT-base 12-layer, 768-hidden, 12-heads, 110M parameters

Model	Language	Corpus	Google Download	Baidu Disk Download
Chinese-PERT-large	Chinese	EXT data ^[1]	TensorFlow	TensorFlow (password: e9hs)
Chinese-PERT-base	Chinese	EXT data ^[1]	TensorFlow	TensorFlow (password: rcsw)
English-PERT-large (uncased)	English	WikiBooks^[2]	TensorFlow	TensorFlow (password: wxwi)
English-PERT-base (uncased)	English	WikiBooks^[2]	TensorFlow	TensorFlow (password: 8jgq)

[1] EXT data includes: Chinese Wikipedia, encyclopedias, news, question answering web, etc. The total number of words is 5.4B, taking about 20G of disk space, which is the same as MacBERT. [2] Wikipedia + BookCorpus

Take the TensorFlow version of Chinese-PERT-base as an example. The zip archive contains the following files:

chinese_pert_base_L-12_H-768_A-12.zip
    |- pert_model.ckpt # model weights
    |- pert_model.meta # model meta information
    |- pert_model.index # model index information
    |- pert_config.json # model parameters
    |- vocab.txt # Vocabulary (same as original vocabulary of Google's BERT-base-Chinese)

Among them, bert_config.json and vocab.txt are exactly the same as Google's original BERT-base, Chinese (the English version is the same as the BERT-uncased version).

PyTorch and TensorFlow 2 version

TensorFlow (v2) and PyTorch version models can be downloaded through the 🤗transformers model library.

Download method: Click on any model to be downloaded → select the "Files and versions" tab → download the corresponding model file.

Model	Model File Size	Transformers ModelHub URL
Chinese-PERT-large	1.2G	https://huggingface.co/hfl/chinese-pert-large
Chinese-PERT-base	0.4G	https://huggingface.co/hfl/chinese-pert-base
Chinese-PERT-large-MRC	1.2G	https://huggingface.co/hfl/chinese-pert-large-mrc
Chinese-PERT-base-MRC	0.4G	https://huggingface.co/hfl/chinese-pert-base-mrc
English-PERT-large	1.2G	https://huggingface.co/hfl/english-pert-large
English-PERT-base	0.4G	https://huggingface.co/hfl/english-pert-base

Quick Load

Since the main body of PERT is still the same as the BERT structure, users can easily call the PERT model using the transformers library.

**Note: All PERT models in this project should be loaded by using BertTokenizer and BertModel (BertForQuestionAnswering for MRC models). **

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
model = BertModel.from_pretrained("MODEL_NAME")

The list of MODEL_NAME is as follows:

Model name	MODEL_NAME
Chinese-PERT-large	hfl/chinese-pert-large
Chinese-PERT-base	hfl/chinese-pert-base
Chinese-PERT-large-MRC	hfl/chinese-pert-large-mrc
Chinese-PERT-base-MRC	hfl/chinese-pert-base-mrc
English-PERT-large	hfl/english-pert-large
English-PERT-base	hfl/english-pert-base

Baseline Performance

For detailed performance, please see: https://arxiv.org/abs/2203.06906

We report both average score (in brackets) and maximum.

Chinese Tasks

We perform experiments on the following ten Chinese tasks.

Machine Reading Comprehension (2)：CMRC 2018 (Simplified Chinese)、DRCD (Traditional Chinese)
Text Classification (6)：
- Single Sentence (2)：ChnSentiCorp、TNEWS
- Sentence Pair (4)：XNLI、LCQMC、BQ Corpus、OCNLI
Named Entity Recognition (NER) (2)：MSRA-NER、People's Daily (人民日报)

Machine Reading Comprehension

Text Classification

Named Entity Recognition

Text Correction (word order recovery)

Besides, we also carried out experiments on the word order recovery task, which is a part of the text correction.

English Tasks

We perform experiments on the following six English tasks.

Machine Reading Comprehension (2)：SQuAD 1.1、SQuAD 2.0
GLUE Tasks (4)：MNLI、SST-2、CoLA、MRPC

FAQ

Q1: About the open-source version of PERT
A1: The open source version only contains the weights of the Transformer part, which can be directly used for fine-tuning of downstream tasks, or for the initialization of re-pre-training for other models. The original TF version weights may contain randomly initialized MLM weights (Please do not try to use these part). There are two reasons:

To remove unnecessary Adam-related weights (the model size will be shrinked to its 1/3);
Consistent with the BERT model conversion of transformers (this process will use the original BERT structure, so the weights of the pre-training task part will be lost, and the MLM random initialization weights of BERT will be retained).

Q2: About the effect of PERT on downstream tasks
A2: The preliminary conclusion is that the effect is better in tasks such as reading comprehension and sequence labeling, but the effect is poor in text classification tasks. Please try the specific effects on your own tasks. For more information, please read our paper: https://arxiv.org/abs/2203.06906

Citation

Please cite our paper if you find the resource or model useful. https://arxiv.org/abs/2203.06906

@article{cui2022pert,
      title={PERT: Pre-training BERT with Permuted Language Model}, 
      author={Cui, Yiming and Yang, Ziqing and Liu, Ting},
      year={2022},
      eprint={2203.06906},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Feedback

If you have questions, please submit them in a GitHub Issue.

You are advised to read FAQ first before you submit an issue.
Repetitive and irrelevant issues will be ignored and closed by [stable-bot](stale · GitHub Marketplace). Thank you for your understanding and support.
We cannot acommodate EVERY request, and thus please bare in mind that there is no guarantee that your request will be met.
Always be polite when you submit an issue.

Chapter	Description
Introduction	The basic principle of PERT
Download	Download pre-trained PERT
QuickLoad	How to use 🤗Transformers to quickly load models
Baseline Performance	Baseline system performances on some NLU tasks
FAQ	Frequently Asked Questions
Citation	Technical report of this project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

News

Table of Contents

Introduction

Download

Original Download (TF version)

PyTorch and TensorFlow 2 version

Quick Load

Baseline Performance

Chinese Tasks

Machine Reading Comprehension

Text Classification

Named Entity Recognition

Text Correction (word order recovery)

English Tasks

FAQ

Citation

Follow us

Feedback

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

News

Table of Contents

Introduction

Download

Original Download (TF version)

PyTorch and TensorFlow 2 version

Quick Load

Baseline Performance

Chinese Tasks

Machine Reading Comprehension

Text Classification

Named Entity Recognition

Text Correction (word order recovery)

English Tasks

FAQ

Citation

Follow us

Feedback