forked from PaddlePaddle/PaddleOCR
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* kie doc * fix xlm model export * fix doc * add wildreceipt dataset * fix doc * fix doc
- Loading branch information
1 parent
7054013
commit 78871cf
Showing
22 changed files
with
1,035 additions
and
94 deletions.
There are no files selected for viewing
Binary file added
BIN
+181 KB
doc/datasets/wildreceipt_demo/1bbe854b8817dedb8585e0732089fd1f752d2cec.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
# 关键信息抽取算法-LayoutXLM | ||
|
||
- [1. 算法简介](#1-算法简介) | ||
- [2. 环境配置](#2-环境配置) | ||
- [3. 模型训练、评估、预测](#3-模型训练评估预测) | ||
- [4. 推理部署](#4-推理部署) | ||
- [4.1 Python推理](#41-python推理) | ||
- [4.2 C++推理部署](#42-推理部署) | ||
- [4.3 Serving服务化部署](#43-serving服务化部署) | ||
- [4.4 更多推理部署](#44-更多推理部署) | ||
- [5. FAQ](#5-faq) | ||
- [引用](#引用) | ||
|
||
|
||
<a name="1"></a> | ||
|
||
## 1. 算法简介 | ||
|
||
|
||
论文信息: | ||
|
||
> [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) | ||
> | ||
> Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei | ||
> | ||
> 2021 | ||
在XFUND_zh数据集上,算法复现效果如下: | ||
|
||
|模型|骨干网络|任务|配置文件|hmean|下载链接| | ||
| --- | --- |--|--- | --- | --- | | ||
|LayoutXLM|LayoutXLM-base|SER |[ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%|[训练模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)/[推理模型](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar)| | ||
|LayoutXLM|LayoutXLM-base|RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|[训练模型](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)/[推理模型(coming soon)]()| | ||
|
||
<a name="2"></a> | ||
|
||
## 2. 环境配置 | ||
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。 | ||
|
||
|
||
<a name="3"></a> | ||
|
||
## 3. 模型训练、评估、预测 | ||
|
||
请参考[关键信息抽取教程](./kie.md)。PaddleOCR对代码进行了模块化,训练不同的关键信息抽取模型只需要**更换配置文件**即可。 | ||
|
||
|
||
<a name="4"></a> | ||
## 4. 推理部署 | ||
|
||
<a name="4-1"></a> | ||
|
||
### 4.1 Python推理 | ||
|
||
**注:** 目前RE任务推理过程仍在适配中,下面以SER任务为例,介绍基于LayoutXLM模型的关键信息抽取过程。 | ||
|
||
首先将训练得到的模型转换成inference model。LayoutXLM模型在XFUND_zh数据集上训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)),可以使用下面的命令进行转换。 | ||
|
||
``` bash | ||
wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar | ||
tar -xf ser_LayoutXLM_xfun_zh.tar | ||
python3 tools/export_model.py -c configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./ser_LayoutXLM_xfun_zh/best_accuracy Global.save_inference_dir=./inference/ser_layoutxlm | ||
``` | ||
|
||
LayoutXLM模型基于SER任务进行推理,可以执行如下命令: | ||
|
||
```bash | ||
cd ppstructure | ||
python3 vqa/predict_vqa_token_ser.py \ | ||
--vqa_algorithm=LayoutXLM \ | ||
--ser_model_dir=../inference/ser_layoutxlm_infer \ | ||
--image_dir=./docs/vqa/input/zh_val_42.jpg \ | ||
--ser_dict_path=../train_data/XFUND/class_list_xfun.txt \ | ||
--vis_font_path=../doc/fonts/simfang.ttf | ||
``` | ||
|
||
SER可视化结果默认保存到`./output`文件夹里面,结果示例如下: | ||
|
||
<div align="center"> | ||
<img src="../../ppstructure/docs/vqa/result_ser/zh_val_42_ser.jpg" width="800"> | ||
</div> | ||
|
||
|
||
<a name="4-2"></a> | ||
### 4.2 C++推理部署 | ||
|
||
暂不支持 | ||
|
||
<a name="4-3"></a> | ||
### 4.3 Serving服务化部署 | ||
|
||
暂不支持 | ||
|
||
<a name="4-4"></a> | ||
### 4.4 更多推理部署 | ||
|
||
暂不支持 | ||
|
||
<a name="5"></a> | ||
|
||
## 5. FAQ | ||
|
||
## 引用 | ||
|
||
|
||
```bibtex | ||
@article{DBLP:journals/corr/abs-2104-08836, | ||
author = {Yiheng Xu and | ||
Tengchao Lv and | ||
Lei Cui and | ||
Guoxin Wang and | ||
Yijuan Lu and | ||
Dinei Flor{\^{e}}ncio and | ||
Cha Zhang and | ||
Furu Wei}, | ||
title = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich | ||
Document Understanding}, | ||
journal = {CoRR}, | ||
volume = {abs/2104.08836}, | ||
year = {2021}, | ||
url = {https://arxiv.org/abs/2104.08836}, | ||
eprinttype = {arXiv}, | ||
eprint = {2104.08836}, | ||
timestamp = {Thu, 14 Oct 2021 09:17:23 +0200}, | ||
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08836.bib}, | ||
bibsource = {dblp computer science bibliography, https://dblp.org} | ||
} | ||
@article{DBLP:journals/corr/abs-1912-13318, | ||
author = {Yiheng Xu and | ||
Minghao Li and | ||
Lei Cui and | ||
Shaohan Huang and | ||
Furu Wei and | ||
Ming Zhou}, | ||
title = {LayoutLM: Pre-training of Text and Layout for Document Image Understanding}, | ||
journal = {CoRR}, | ||
volume = {abs/1912.13318}, | ||
year = {2019}, | ||
url = {http://arxiv.org/abs/1912.13318}, | ||
eprinttype = {arXiv}, | ||
eprint = {1912.13318}, | ||
timestamp = {Mon, 01 Jun 2020 16:20:46 +0200}, | ||
biburl = {https://dblp.org/rec/journals/corr/abs-1912-13318.bib}, | ||
bibsource = {dblp computer science bibliography, https://dblp.org} | ||
} | ||
@article{DBLP:journals/corr/abs-2012-14740, | ||
author = {Yang Xu and | ||
Yiheng Xu and | ||
Tengchao Lv and | ||
Lei Cui and | ||
Furu Wei and | ||
Guoxin Wang and | ||
Yijuan Lu and | ||
Dinei A. F. Flor{\^{e}}ncio and | ||
Cha Zhang and | ||
Wanxiang Che and | ||
Min Zhang and | ||
Lidong Zhou}, | ||
title = {LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding}, | ||
journal = {CoRR}, | ||
volume = {abs/2012.14740}, | ||
year = {2020}, | ||
url = {https://arxiv.org/abs/2012.14740}, | ||
eprinttype = {arXiv}, | ||
eprint = {2012.14740}, | ||
timestamp = {Tue, 27 Jul 2021 09:53:52 +0200}, | ||
biburl = {https://dblp.org/rec/journals/corr/abs-2012-14740.bib}, | ||
bibsource = {dblp computer science bibliography, https://dblp.org} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
|
||
# 关键信息抽取算法-SDMGR | ||
|
||
- [1. 算法简介](#1-算法简介) | ||
- [2. 环境配置](#2-环境配置) | ||
- [3. 模型训练、评估、预测](#3-模型训练评估预测) | ||
- [3.1 模型训练](#31-模型训练) | ||
- [3.2 模型评估](#32-模型评估) | ||
- [3.3 模型预测](#33-模型预测) | ||
- [4. 推理部署](#4-推理部署) | ||
- [4.1 Python推理](#41-python推理) | ||
- [4.2 C++推理部署](#42-c推理部署) | ||
- [4.3 Serving服务化部署](#43-serving服务化部署) | ||
- [4.4 更多推理部署](#44-更多推理部署) | ||
- [5. FAQ](#5-faq) | ||
- [引用](#引用) | ||
|
||
|
||
<a name="1"></a> | ||
|
||
## 1. 算法简介 | ||
|
||
|
||
论文信息: | ||
|
||
> [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470) | ||
> | ||
> Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang | ||
> | ||
> 2021 | ||
在wildreceipt发票公开数据集上,算法复现效果如下: | ||
|
||
|模型|骨干网络|配置文件|hmean|下载链接| | ||
| --- | --- | --- | --- | --- | | ||
|SDMGR|VGG6|[configs/kie/sdmgr/kie_unet_sdmgr.yml](../../configs/kie/sdmgr/kie_unet_sdmgr.yml)|86.7%|[训练模型]( https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)/[推理模型(coming soon)]()| | ||
|
||
|
||
<a name="2"></a> | ||
|
||
## 2. 环境配置 | ||
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。 | ||
|
||
|
||
<a name="3"></a> | ||
|
||
## 3. 模型训练、评估、预测 | ||
|
||
SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。 | ||
|
||
训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集: | ||
|
||
```bash | ||
wget https://paddleocr.bj.bcebos.com/ppstructure/dataset/wildreceipt.tar && tar xf wildreceipt.tar | ||
``` | ||
|
||
创建数据集软链到PaddleOCR/train_data目录下: | ||
``` | ||
cd PaddleOCR/ && mkdir train_data && cd train_data | ||
ln -s ../../wildreceipt ./ | ||
``` | ||
|
||
|
||
### 3.1 模型训练 | ||
|
||
训练采用的配置文件是`configs/kie/sdmgr/kie_unet_sdmgr.yml`,配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练: | ||
|
||
``` | ||
python3 tools/train.py -c configs/kie/sdmgr/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/ | ||
``` | ||
|
||
### 3.2 模型评估 | ||
|
||
执行下面的命令进行模型评估 | ||
|
||
```bash | ||
python3 tools/eval.py -c configs/kie/sdmgr/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy | ||
``` | ||
|
||
输出信息示例如下所示。 | ||
|
||
```py | ||
[2022/08/10 05:22:23] ppocr INFO: metric eval *************** | ||
[2022/08/10 05:22:23] ppocr INFO: hmean:0.8670120239257812 | ||
[2022/08/10 05:22:23] ppocr INFO: fps:10.18816520530961 | ||
``` | ||
|
||
### 3.3 模型预测 | ||
|
||
执行下面的命令进行模型预测,预测的时候需要预先加载存储图片路径以及OCR信息的文本文件,使用`Global.infer_img`进行指定。 | ||
|
||
```bash | ||
python3 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=./train_data/wildreceipt/1.txt | ||
``` | ||
|
||
执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。 | ||
|
||
可视化结果如下图所示: | ||
|
||
<div align="center"> | ||
<img src="../../ppstructure/docs/imgs/sdmgr_result.png" width="800"> | ||
</div> | ||
|
||
<a name="4"></a> | ||
## 4. 推理部署 | ||
|
||
<a name="4-1"></a> | ||
### 4.1 Python推理 | ||
|
||
暂不支持 | ||
|
||
<a name="4-2"></a> | ||
### 4.2 C++推理部署 | ||
|
||
暂不支持 | ||
|
||
<a name="4-3"></a> | ||
### 4.3 Serving服务化部署 | ||
|
||
暂不支持 | ||
|
||
<a name="4-4"></a> | ||
### 4.4 更多推理部署 | ||
|
||
暂不支持 | ||
|
||
<a name="5"></a> | ||
|
||
## 5. FAQ | ||
|
||
## 引用 | ||
|
||
|
||
```bibtex | ||
@misc{sun2021spatial, | ||
title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction}, | ||
author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang}, | ||
year={2021}, | ||
eprint={2103.14470}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV} | ||
} | ||
``` |
Oops, something went wrong.