fix PP-OCRv3 det train (PaddlePaddle#8208)

TQC10 · Nov 4, 2022 · 5f06a80 · 5f06a80
1 parent 4604e76
commit 5f06a80
Show file tree

Hide file tree

Showing 4 changed files with 270 additions and 14 deletions.
diff --git a/doc/doc_ch/PPOCRv3_det_train.md → doc/doc_ch/PP-OCRv3_det_train.md b/doc/doc_ch/PPOCRv3_det_train.md → doc/doc_ch/PP-OCRv3_det_train.md
@@ -1,14 +1,16 @@
+[English](../doc_en/PP-OCRv3_det_train_en.md) | 简体中文
+
 
 # PP-OCRv3 文本检测模型训练
 
 - [1. 简介](#1)
-- [2. PPOCRv3检测训练](#2)
-- [3. 基于PPOCRv3检测的finetune训练](#3)
+- [2. PP-OCRv3检测训练](#2)
+- [3. 基于PP-OCRv3检测的finetune训练](#3)
 
 <a name="1"></a>
 ## 1. 简介
 
-PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PPOCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)。
+PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PP-OCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)。
 
 
 <a name="2"></a>
@@ -55,10 +57,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
 
 训练过程中保存的模型在output目录下，包含以下文件：
 ```
-best_accuracy.states    
+best_accuracy.states  
 best_accuracy.pdparams  # 默认保存最优精度的模型参数
 best_accuracy.pdopt     # 默认保存最优精度的优化器相关参数
-latest.states    
+latest.states  
 latest.pdparams  # 默认保存的最新模型参数
 latest.pdopt     # 默认保存的最新模型的优化器相关参数
 ```
@@ -145,19 +147,19 @@ paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
 
 
 <a name="3"></a>
-## 3. 基于PPOCRv3检测finetune训练
+## 3. 基于PP-OCRv3检测finetune训练
 
-本节介绍如何使用PPOCRv3检测模型在其他场景上的finetune训练。
+本节介绍如何使用PP-OCRv3检测模型在其他场景上的finetune训练。
 
 finetune训练适用于三种场景：
-- 基于CML蒸馏方法的finetune训练，适用于教师模型在使用场景上精度高于PPOCRv3检测模型，且希望得到一个轻量检测模型。
-- 基于PPOCRv3轻量检测模型的finetune训练，无需训练教师模型，希望在PPOCRv3检测模型基础上提升使用场景上的精度。
+- 基于CML蒸馏方法的finetune训练，适用于教师模型在使用场景上精度高于PP-OCRv3检测模型，且希望得到一个轻量检测模型。
+- 基于PP-OCRv3轻量检测模型的finetune训练，无需训练教师模型，希望在PP-OCRv3检测模型基础上提升使用场景上的精度。
 - 基于DML蒸馏方法的finetune训练，适用于采用DML方法进一步提升精度的场景。
 
 
 **基于CML蒸馏方法的finetune训练**
 
-下载PPOCRv3训练模型：
+下载PP-OCRv3训练模型：
 ```
 wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
 tar xf ch_PP-OCRv3_det_distill_train.tar
@@ -177,10 +179,10 @@ python3  -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
        Global.save_model_dir=./output/
 ```
 
-**基于PPOCRv3轻量检测模型的finetune训练**
+**基于PP-OCRv3轻量检测模型的finetune训练**
 
 
-下载PPOCRv3训练模型，并提取Student结构的模型参数：
+下载PP-OCRv3训练模型，并提取Student结构的模型参数：
 ```
 wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
 tar xf ch_PP-OCRv3_det_distill_train.tar
@@ -248,5 +250,3 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
        Architecture.Models.Student2.pretrained=./teacher \
        Global.save_model_dir=./output/
 ```
-
-
diff --git a/doc/doc_ch/PP-OCRv3_introduction.md b/doc/doc_ch/PP-OCRv3_introduction.md
@@ -63,6 +63,8 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p
 
 测试环境： Intel Gold 6148 CPU，预测时开启MKLDNN加速。
 
+PP-OCRv3检测模型训练步骤参考[文档](./PP-OCRv3_det_train.md)
+
 **（1）LK-PAN：大感受野的PAN结构**
 
 LK-PAN (Large Kernel PAN) 是一个具有更大感受野的轻量级[PAN](https://arxiv.org/pdf/1803.01534.pdf)结构，核心是将PAN结构的path augmentation中卷积核从`3*3`改为`9*9`。通过增大卷积核，提升特征图每个位置覆盖的感受野，更容易检测大字体的文字以及极端长宽比的文字。使用LK-PAN结构，可以将教师模型的hmean从83.2%提升到85.0%。

diff --git a/doc/doc_en/PP-OCRv3_det_train_en.md b/doc/doc_en/PP-OCRv3_det_train_en.md
@@ -0,0 +1,253 @@
+English | [简体中文](../doc_ch/PP-OCRv3_det_train.md)
+
+
+# The training steps of PP-OCRv3 text detection model
+
+- [1. Introduction](#1)
+- [2. PP-OCRv3 detection training](#2)
+- [3. Finetune training based on PP-OCRv3 detection](#3)
+
+<a name="1"></a>
+## 1 Introduction
+
+PP-OCRv3 is further upgraded on the basis of PP-OCRv2. This section describes the training steps of the PP-OCRv3 detection model. Refer to [documentation](./ppocr_introduction_en.md) for PP-OCRv3 introduction.
+
+
+<a name="2"></a>
+## 2. Detection training
+
+The PP-OCRv3 detection model is an upgrade of the [CML](https://arxiv.org/pdf/2109.03144.pdf) (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. PP-OCRv3 is further optimized for detecting teacher model and student model respectively. Among them, when optimizing the teacher model, the PAN structure LK-PAN with large receptive field and the DML (Deep Mutual Learning) distillation strategy are proposed. when optimizing the student model, the FPN structure RSE-FPN with residual attention mechanism is proposed.
+
+PP-OCRv3 detection training consists of two steps:
+- Step 1: Train detection teacher model using DML distillation method
+- Step 2: Use the teacher model obtained in Step 1 to train a lightweight student model using the CML method
+
+
+### 2.1 Prepare data and environment
+
+The training data adopts icdar2015 data, and the steps to prepare the training set refer to [ocr_dataset](./dataset/ocr_datasets.md).
+
+Runtime environment preparation reference [documentation](./installation_en.md).
+
+### 2.2 Train the teacher model
+
+The configuration file for teacher model training is [ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml). The Backbone, Neck, and Head of the model structure of the teacher model are Resnet50, LKPAN, and DBHead, respectively, and are trained by the distillation method of DML. Refer to [documentation](./knowledge_distillation) for a detailed introduction to configuration files.
+
+
+Download ImageNet pretrained models:
+````
+# Download the pretrained model of ResNet50_vd
+wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+````
+
+**Start training**
+````
+# Single GPU training
+python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
+    -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
+       Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
+       Global.save_model_dir=./output/
+
+# If you want to use multi-GPU distributed training, use the following command:
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
+    -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
+       Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
+       Global.save_model_dir=./output/
+````
+
+The model saved during training is in the output directory and contains the following files:
+````
+best_accuracy.states
+best_accuracy.pdparams # The model parameters with the best accuracy are saved by default
+best_accuracy.pdopt # optimizer-related parameters that save optimal accuracy by default
+latest.states
+latest.pdparams # The latest model parameters saved by default
+latest.pdopt # Optimizer related parameters of the latest model saved by default
+````
+Among them, best_accuracy is the saved model parameter with the highest accuracy, which can be directly evaluated using this model.
+
+The model evaluation command is as follows:
+````
+python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.checkpoints=./output/best_accuracy
+````
+
+The trained teacher model has a larger structure and higher accuracy, which is used to improve the accuracy of the student model.
+
+**Extract teacher model parameters**
+best_accuracy contains the parameters of two models, corresponding to Student and Student2 in the configuration file respectively. The method of extracting the parameters of Student is as follows:
+
+````
+import paddle
+# load pretrained model
+all_params = paddle.load("output/best_accuracy.pdparams")
+# View the keys of the weight parameter
+print(all_params.keys())
+# model weight extraction
+s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
+# View the keys of the model weight parameters
+print(s_params.keys())
+# save
+paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams")
+````
+
+The extracted model parameters can be used for further finetune training or distillation training of the model.
+
+
+### 2.3 Train the student model
+
+The configuration file for training the student model is [ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)
+The teacher model trained in the previous section is used as supervision, and the lightweight student model is obtained by training in CML.
+
+Download the ImageNet pretrained model for the student model:
+````
+# Download the pre-trained model of MobileNetV3
+wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
+````
+
+**Start training**
+
+````
+# Single card training
+python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
+    -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+       Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+       Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
+       Global.save_model_dir=./output/
+# If you want to use multi-GPU distributed training, use the following command:
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
+    -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+       Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+       Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
+       Global.save_model_dir=./output/
+````
+
+The model saved during training is in the output directory,
+The model evaluation command is as follows:
+````
+python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/best_accuracy
+````
+
+best_accuracy contains three model parameters, corresponding to Student, Student2, and Teacher in the configuration file. The method to extract the Student parameter is as follows:
+
+````
+import paddle
+# load pretrained model
+all_params = paddle.load("output/best_accuracy.pdparams")
+# View the keys of the weight parameter
+print(all_params.keys())
+# model weight extraction
+s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
+# View the keys of the model weight parameters
+print(s_params.keys())
+# save
+paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
+````
+
+The extracted parameters of Student can be used for model deployment or further finetune training.
+
+
+
+<a name="3"></a>
+## 3. Finetune training based on PP-OCRv3 detection
+
+This section describes how to use the finetune training of the PP-OCRv3 detection model on other scenarios.
+
+finetune training applies to three scenarios:
+- The finetune training based on the CML distillation method is suitable for the teacher model whose accuracy is higher than the PP-OCRv3 detection model in the usage scene, and a lightweight detection model is desired.
+- Finetune training based on the PP-OCRv3 lightweight detection model, without the need to train the teacher model, hoping to improve the accuracy of the usage scenarios based on the PP-OCRv3 detection model.
+- The finetune training based on the DML distillation method is suitable for scenarios where the DML method is used to further improve the accuracy.
+
+
+**finetune training based on CML distillation method**
+
+Download the PP-OCRv3 training model:
+````
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
+tar xf ch_PP-OCRv3_det_distill_train.tar
+````
+ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams contains the parameters of the Student, Student2, and Teacher models in the CML configuration file.
+
+Start training:
+
+````
+# Single card training
+python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
+    -o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
+       Global.save_model_dir=./output/
+# If you want to use multi-GPU distributed training, use the following command:
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
+    -o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
+       Global.save_model_dir=./output/
+````
+
+**finetune training based on PP-OCRv3 lightweight detection model**
+
+
+Download the PP-OCRv3 training model and extract the model parameters of the Student structure:
+````
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
+tar xf ch_PP-OCRv3_det_distill_train.tar
+````
+
+The method to extract the Student parameter is as follows:
+
+````
+import paddle
+# load pretrained model
+all_params = paddle.load("output/best_accuracy.pdparams")
+# View the keys of the weight parameter
+print(all_params.keys())
+# model weight extraction
+s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
+# View the keys of the model weight parameters
+print(s_params.keys())
+# save
+paddle.save(s_params, "./student.pdparams")
+````
+
+Trained using the configuration file [ch_PP-OCRv3_det_student.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml).
+
+**Start training**
+
+````
+# Single card training
+python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
+    -o Global.pretrained_model=./student \
+       Global.save_model_dir=./output/
+# If you want to use multi-GPU distributed training, use the following command:
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
+    -o Global.pretrained_model=./student \
+       Global.save_model_dir=./output/
+````
+
+
+**finetune training based on DML distillation method**
+
+Taking the Teacher model in ch_PP-OCRv3_det_distill_train as an example, first extract the parameters of the Teacher structure as follows:
+````
+import paddle
+# load pretrained model
+all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
+# View the keys of the weight parameter
+print(all_params.keys())
+# model weight extraction
+s_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
+# View the keys of the model weight parameters
+print(s_params.keys())
+# save
+paddle.save(s_params, "./teacher.pdparams")
+````
+
+**Start training**
+````
+# Single card training
+python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
+     -o Architecture.Models.Student.pretrained=./teacher \
+        Architecture.Models.Student2.pretrained=./teacher \
+        Global.save_model_dir=./output/
+# If you want to use multi-GPU distributed training, use the following command:
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
+     -o Architecture.Models.Student.pretrained=./teacher \
+        Architecture.Models.Student2.pretrained=./teacher \
+        Global.save_model_dir=./output/
+````
diff --git a/doc/doc_en/PP-OCRv3_introduction_en.md b/doc/doc_en/PP-OCRv3_introduction_en.md
@@ -65,6 +65,7 @@ The ablation experiments are as follows:
 
 Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during inference.
 
+The training steps of PP-OCRv3 detection model refer to [tutorial](./PP-OCRv3_det_train_en.md)
 
 **(1) LK-PAN: A PAN structure with large receptive field**
Original file line number	Diff line number	Diff line change
Expand Up		@@ -65,6 +65,7 @@ The ablation experiments are as follows:

		Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during inference.

		The training steps of PP-OCRv3 detection model refer to [tutorial](./PP-OCRv3_det_train_en.md)

		(1) LK-PAN: A PAN structure with large receptive field

Expand Down