Skip to content

Commit

Permalink
fix PP-OCRv3 det train (PaddlePaddle#8208)
Browse files Browse the repository at this point in the history
  • Loading branch information
LDOUBLEV authored Nov 4, 2022
1 parent 4604e76 commit 5f06a80
Show file tree
Hide file tree
Showing 4 changed files with 270 additions and 14 deletions.
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
[English](../doc_en/PP-OCRv3_det_train_en.md) | 简体中文


# PP-OCRv3 文本检测模型训练

- [1. 简介](#1)
- [2. PPOCRv3检测训练](#2)
- [3. 基于PPOCRv3检测的finetune训练](#3)
- [2. PP-OCRv3检测训练](#2)
- [3. 基于PP-OCRv3检测的finetune训练](#3)

<a name="1"></a>
## 1. 简介

PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PPOCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)
PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PP-OCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)


<a name="2"></a>
Expand Down Expand Up @@ -55,10 +57,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/

训练过程中保存的模型在output目录下,包含以下文件:
```
best_accuracy.states
best_accuracy.states
best_accuracy.pdparams # 默认保存最优精度的模型参数
best_accuracy.pdopt # 默认保存最优精度的优化器相关参数
latest.states
latest.states
latest.pdparams # 默认保存的最新模型参数
latest.pdopt # 默认保存的最新模型的优化器相关参数
```
Expand Down Expand Up @@ -145,19 +147,19 @@ paddle.save(s_params, "./pretrain_models/cml_student.pdparams")


<a name="3"></a>
## 3. 基于PPOCRv3检测finetune训练
## 3. 基于PP-OCRv3检测finetune训练

本节介绍如何使用PPOCRv3检测模型在其他场景上的finetune训练
本节介绍如何使用PP-OCRv3检测模型在其他场景上的finetune训练

finetune训练适用于三种场景:
- 基于CML蒸馏方法的finetune训练,适用于教师模型在使用场景上精度高于PPOCRv3检测模型,且希望得到一个轻量检测模型。
- 基于PPOCRv3轻量检测模型的finetune训练,无需训练教师模型,希望在PPOCRv3检测模型基础上提升使用场景上的精度
- 基于CML蒸馏方法的finetune训练,适用于教师模型在使用场景上精度高于PP-OCRv3检测模型,且希望得到一个轻量检测模型。
- 基于PP-OCRv3轻量检测模型的finetune训练,无需训练教师模型,希望在PP-OCRv3检测模型基础上提升使用场景上的精度
- 基于DML蒸馏方法的finetune训练,适用于采用DML方法进一步提升精度的场景。


**基于CML蒸馏方法的finetune训练**

下载PPOCRv3训练模型
下载PP-OCRv3训练模型
```
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
Expand All @@ -177,10 +179,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
Global.save_model_dir=./output/
```

**基于PPOCRv3轻量检测模型的finetune训练**
**基于PP-OCRv3轻量检测模型的finetune训练**


下载PPOCRv3训练模型,并提取Student结构的模型参数:
下载PP-OCRv3训练模型,并提取Student结构的模型参数:
```
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
Expand Down Expand Up @@ -248,5 +250,3 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
```


2 changes: 2 additions & 0 deletions doc/doc_ch/PP-OCRv3_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p

测试环境: Intel Gold 6148 CPU,预测时开启MKLDNN加速。

PP-OCRv3检测模型训练步骤参考[文档](./PP-OCRv3_det_train.md)

**(1)LK-PAN:大感受野的PAN结构**

LK-PAN (Large Kernel PAN) 是一个具有更大感受野的轻量级[PAN](https://arxiv.org/pdf/1803.01534.pdf)结构,核心是将PAN结构的path augmentation中卷积核从`3*3`改为`9*9`。通过增大卷积核,提升特征图每个位置覆盖的感受野,更容易检测大字体的文字以及极端长宽比的文字。使用LK-PAN结构,可以将教师模型的hmean从83.2%提升到85.0%。
Expand Down
253 changes: 253 additions & 0 deletions doc/doc_en/PP-OCRv3_det_train_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
English | [简体中文](../doc_ch/PP-OCRv3_det_train.md)


# The training steps of PP-OCRv3 text detection model

- [1. Introduction](#1)
- [2. PP-OCRv3 detection training](#2)
- [3. Finetune training based on PP-OCRv3 detection](#3)

<a name="1"></a>
## 1 Introduction

PP-OCRv3 is further upgraded on the basis of PP-OCRv2. This section describes the training steps of the PP-OCRv3 detection model. Refer to [documentation](./ppocr_introduction_en.md) for PP-OCRv3 introduction.


<a name="2"></a>
## 2. Detection training

The PP-OCRv3 detection model is an upgrade of the [CML](https://arxiv.org/pdf/2109.03144.pdf) (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. PP-OCRv3 is further optimized for detecting teacher model and student model respectively. Among them, when optimizing the teacher model, the PAN structure LK-PAN with large receptive field and the DML (Deep Mutual Learning) distillation strategy are proposed. when optimizing the student model, the FPN structure RSE-FPN with residual attention mechanism is proposed.

PP-OCRv3 detection training consists of two steps:
- Step 1: Train detection teacher model using DML distillation method
- Step 2: Use the teacher model obtained in Step 1 to train a lightweight student model using the CML method


### 2.1 Prepare data and environment

The training data adopts icdar2015 data, and the steps to prepare the training set refer to [ocr_dataset](./dataset/ocr_datasets.md).

Runtime environment preparation reference [documentation](./installation_en.md).

### 2.2 Train the teacher model

The configuration file for teacher model training is [ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml). The Backbone, Neck, and Head of the model structure of the teacher model are Resnet50, LKPAN, and DBHead, respectively, and are trained by the distillation method of DML. Refer to [documentation](./knowledge_distillation) for a detailed introduction to configuration files.


Download ImageNet pretrained models:
````
# Download the pretrained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams
````

**Start training**
````
# Single GPU training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=./output/
````

The model saved during training is in the output directory and contains the following files:
````
best_accuracy.states
best_accuracy.pdparams # The model parameters with the best accuracy are saved by default
best_accuracy.pdopt # optimizer-related parameters that save optimal accuracy by default
latest.states
latest.pdparams # The latest model parameters saved by default
latest.pdopt # Optimizer related parameters of the latest model saved by default
````
Among them, best_accuracy is the saved model parameter with the highest accuracy, which can be directly evaluated using this model.

The model evaluation command is as follows:
````
python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.checkpoints=./output/best_accuracy
````

The trained teacher model has a larger structure and higher accuracy, which is used to improve the accuracy of the student model.

**Extract teacher model parameters**
best_accuracy contains the parameters of two models, corresponding to Student and Student2 in the configuration file respectively. The method of extracting the parameters of Student is as follows:

````
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams")
````

The extracted model parameters can be used for further finetune training or distillation training of the model.


### 2.3 Train the student model

The configuration file for training the student model is [ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)
The teacher model trained in the previous section is used as supervision, and the lightweight student model is obtained by training in CML.

Download the ImageNet pretrained model for the student model:
````
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
````

**Start training**

````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
Global.save_model_dir=./output/
````

The model saved during training is in the output directory,
The model evaluation command is as follows:
````
python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/best_accuracy
````

best_accuracy contains three model parameters, corresponding to Student, Student2, and Teacher in the configuration file. The method to extract the Student parameter is as follows:

````
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
````

The extracted parameters of Student can be used for model deployment or further finetune training.



<a name="3"></a>
## 3. Finetune training based on PP-OCRv3 detection

This section describes how to use the finetune training of the PP-OCRv3 detection model on other scenarios.

finetune training applies to three scenarios:
- The finetune training based on the CML distillation method is suitable for the teacher model whose accuracy is higher than the PP-OCRv3 detection model in the usage scene, and a lightweight detection model is desired.
- Finetune training based on the PP-OCRv3 lightweight detection model, without the need to train the teacher model, hoping to improve the accuracy of the usage scenarios based on the PP-OCRv3 detection model.
- The finetune training based on the DML distillation method is suitable for scenarios where the DML method is used to further improve the accuracy.


**finetune training based on CML distillation method**

Download the PP-OCRv3 training model:
````
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
````
ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams contains the parameters of the Student, Student2, and Teacher models in the CML configuration file.

Start training:

````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
Global.save_model_dir=./output/
````

**finetune training based on PP-OCRv3 lightweight detection model**


Download the PP-OCRv3 training model and extract the model parameters of the Student structure:
````
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
````

The method to extract the Student parameter is as follows:

````
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./student.pdparams")
````

Trained using the configuration file [ch_PP-OCRv3_det_student.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml).

**Start training**

````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model=./student \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model=./student \
Global.save_model_dir=./output/
````


**finetune training based on DML distillation method**

Taking the Teacher model in ch_PP-OCRv3_det_distill_train as an example, first extract the parameters of the Teacher structure as follows:
````
import paddle
# load pretrained model
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./teacher.pdparams")
````

**Start training**
````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./teacher \
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./teacher \
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
````
1 change: 1 addition & 0 deletions doc/doc_en/PP-OCRv3_introduction_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ The ablation experiments are as follows:

Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during inference.

The training steps of PP-OCRv3 detection model refer to [tutorial](./PP-OCRv3_det_train_en.md)

**(1) LK-PAN: A PAN structure with large receptive field**

Expand Down

0 comments on commit 5f06a80

Please sign in to comment.