Merge branch 'PaddlePaddle:develop' into test

PaddlePaddle · Jan 7, 2025 · f548061 · f548061
2 parents 6aedca9 + 2bd7bff
commit f548061
Show file tree

Hide file tree

Showing 122 changed files with 14,067 additions and 234 deletions.
diff --git a/README.md b/README.md
@@ -34,23 +34,22 @@
 
 
 ## 📰新闻
-**🔥2024.11.21日 - 2024.12.22日  PaddleMIX开发项目挑战（已结束）**
+**🔥2025.01.07日直播课 飞桨PP系列模型上新！**
 
-- ✨「体验官招募」PaddleMIX开发项目挑战
-点击链接报名🔗：https://aistudio.baidu.com/activitydetail/1503019366
-🏆投稿至飞桨星河社区项目大厅，加精获得PaddleMIX体验官认证证书及京东卡激励
-欢迎大家投稿～
+- ✨PP-DocBee文档图像理解的新‘蜂’向标！
+为了帮助您迅速且深入地了解**PaddleMIX**的**PP-DocBee文档理解特色模型**，并熟练掌握实际操作技巧，百度高级研发工程师将在**1月7日（周二）19:00**，为您详细解读PP-DocBee的核心技术，手把手演示多模态大模型开发全流程。赶快扫描下方海报二维码预约报名！
 <details>
 <summary>点击展开活动海报</summary>
 <p align="center">
-<img src='https://github.com/user-attachments/assets/27e0bbe3-0ff8-49ef-bd39-81a31a2b288b'  width="25%">
+<img src='https://github.com/user-attachments/assets/3b7adc9e-c68d-44d1-9674-05b933947deb'  width="80%">
 </p>
 </details>
 
 ## 📣最新进展
 
 <!-- 📚《飞桨多模态大模型开发套件PaddleMIX 2.1 震撼发布》，图文音视频场景全覆盖，多模态高效助力产业创新。超大规模训练支持，覆盖图文预训练、文生图、跨模态视觉任务，覆盖金融、教育、电商、医疗等产业场景。8月8日（周四）20：00 带你直播了解多模态大模型最新架构，深度解析PaddleMIX高性能模型库，手把手演示LLaVA模型训推全流程。[报名链接](https://www.wjx.top/vm/wKqysjx.aspx?udsid=449688)   -->
 
+**🎉 2024.01.02 新增自研文档理解模型[PP-DocBee](./paddlemix/examples/ppdocbee)推理和训练，支持[高性能推理](./deploy/ppdocbee)**
 
 **🎉 2024.12.17 支持[GOT-OCR2_0](./paddlemix/examples/GOT_OCR_2_0)推理和训练**
 
@@ -64,15 +63,16 @@
 
 **🎉 2024.11.1 支持[LLaVA-OneVision](./paddlemix/examples/llava_onevision/)和[LLaVA-Critic](./paddlemix/examples/llava_critic/)推理**
 
+
+<details>
+<summary>点击展开更多</summary>
+
 **🎉 2024.10.31 喜迎外部开发者的[创作教程页面](paddlemix_applications.md)更新**
 
 * 🌟 自9月6日发起大模型套件精品项目征集活动以来,我们收到了30个优质开发者项目,其中25个精品项目已通过平台评估并成功加精。
 
 * 🙏 衷心感谢各位开发者基于套件的精彩创作！🚀 诚挚邀请您也来分享您的创意 - 欢迎将教程发布到公开网页或[飞桨AI Studio](https://aistudio.baidu.com/aistudio/community/multimodal?from=singlemessage)社区！
 
-<details>
-<summary>点击展开更多</summary>
-
 **🔥2024.10.11 发布PaddleMIX v2.1**
 * 支持[PaddleNLP 3.0 beta](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v3.0.0-beta0)版本，抢先体验其最新功能。
 * 新增[Qwen2-VL](./paddlemix/examples/qwen2_vl/)、[InternVL2](./paddlemix/examples/internvl2/)、[Stable Diffusion 3 (SD3)](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/dreambooth/README_sd3.md)等前沿模型。
@@ -290,6 +290,7 @@ python setup.py install
         <ul>
             <li><a href="paddlemix/examples/groundingdino">Grounding DINO</a></li>
             <li><a href="paddlemix/examples/sam">SAM</a></li>
+            <li><a href="paddlemix/examples/sam2">SAM2</a></li>
             <li><a href="paddlemix/examples/YOLO-World">YOLO-World</a></li>
       </ul>
       </ul>
@@ -358,6 +359,53 @@ python setup.py install
 更多模型能力，可参考[模型能力矩阵](./paddlemix/examples/README.md)
 
 
+
+## 📊多模数据处理工具箱DataCopilot
+<table align="center">
+  <tbody>
+    <tr align="center" valign="center">
+      <td>
+        <b>基础能力</b>
+      </td>
+      <td>
+        <b>数据分析</b>
+      </td>
+      <td>
+        <b>数据生成</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+        <ul>
+        </ul>
+          <li><b>使用文档</b></li>
+        <ul>
+            <li><a href="paddlemix/datacopilot">DataCopilot</a></li>
+      </ul>
+      </td>
+      <td>
+        <ul>
+        </ul>
+          <li><b>能力标签模型</b></li>
+        <ul>
+           <li><a href="paddlemix/datacopilot/example">PP-InsCapTagger</a></li>
+        </ul>
+      </td>
+      <td>
+        <ul>
+        </ul>
+          <li><b>文档类数据生成方案</b></li>
+        <ul>
+          <li><a href="paddlemix/datacopilot/example">PP-InfinityDocData</a></li>
+        </ul>
+      </td>
+    </tr>
+  </tbody>
+</table>
+
+更多数据相关功能，可参考[DataCopilot](./paddlemix/datacopilot)主页
+
+
 ## 🏆特色模型|工具
 
 ### 💎跨模态任务流水线AppFlow

diff --git a/README_EN.md b/README_EN.md
@@ -37,21 +37,21 @@
 
 **🔥PaddleMIX Development Project Challenge (November 21 - December 22, 2024)**
 
-**🔥2024.11.21 - 2024.12.22 PaddleMIX Development Project Challenge (Ended)**
 
-- ✨「Experience Officer Recruitment」PaddleMIX Development Project Challenge
-Click the link to register 🔗: [https://aistudio.baidu.com/activitydetail/1503019366](https://aistudio.baidu.com/activitydetail/1503019366)
-🏆 Submit to the PaddlePaddle Galaxy Community Project Hall to be featured and receive a PaddleMIX Experience Officer certification certificate and JD.com card incentives.
-Everyone is welcome to submit～
+**🔥Live Course on January 7th, 2025: New PaddlePaddle PP Series Models Released!**
 
+- ✨PP-DocBee: A New 'Bee'-ginning in Document Image Understanding!
+To help you quickly and deeply understand **PaddleMIX**'s **PP-DocBee document understanding model** and master practical skills, Baidu's senior R&D engineers will provide a detailed explanation of PP-DocBee's core technology and demonstrate the complete development process of multimodal large models at **19:00 on January 7th (Tuesday)**. Scan the QR code below to register now!
 <details>
-<summary>Click to view the event poster</summary>
+<summary>Click to expand event poster</summary>
 <p align="center">
-<img src='https://github.com/user-attachments/assets/27e0bbe3-0ff8-49ef-bd39-81a31a2b288b' width="25%">
+<img src='https://github.com/user-attachments/assets/3b7adc9e-c68d-44d1-9674-05b933947deb'  width="80%">
 </p>
 </details>
 
+
 ## 📣 Latest Developments
+**🎉 2024.01.02 Added support for [PP-DocBee](./paddlemix/examples/ppdocbee) inference and training, supporting [high-performance inference](./deploy/ppdocbee)**
 
 **🎉 2024.12.17 Support for [InternVL2_5 (1B, 2B, 4B, 8B)](./paddlemix/examples/internvl2) inference**
 
@@ -63,14 +63,15 @@ Everyone is welcome to submit～
 
 **🎉 2024.11.1 Support for [LLaVA-OneVision](./paddlemix/examples/llava_onevision/) and [LLaVA-Critic](./paddlemix/examples/llava_critic/) inference**
 
+
+<details>
+<summary>Click to expand more</summary>
+
 **🎉 2024.10.31 Welcome to the Update of External Developer's Creative [Tutorial Page](paddlemix_applications.md)**
 * 🌟 Since the launch of our Large Model Suite Premium Project Collection activity on September 6th, we have received 30 high-quality developer projects. Among them, 25 premium projects have successfully passed the platform evaluation and been featured.
 
 * 🙏 We sincerely thank all developers for their wonderful creations based on our suite! 🚀 We cordially invite you to share your creativity as well - welcome to publish your tutorials on public web pages or in the [PaddlePaddle AI Studio](https://aistudio.baidu.com/aistudio/community/multimodal?from=singlemessage) community!
 
-<details>
-<summary>Click to expand more</summary>
-
 **🔥 PaddleMIX v2.1 Released on 2024.10.11**
 * Supports the [PaddleNLP 3.0 beta](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v3.0.0-beta0) version, allowing early access to its latest features.
 * Added cutting-edge models like [Qwen2-VL](./paddlemix/examples/qwen2_vl/), [InternVL2](./paddlemix/examples/internvl2/), and [Stable Diffusion 3 (SD3)](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/dreambooth/README_sd3.md).
@@ -284,6 +285,7 @@ python setup.py install
         <ul>
             <li><a href="paddlemix/examples/groundingdino">Grounding DINO</a></li>
             <li><a href="paddlemix/examples/sam">SAM</a></li>
+            <li><a href="paddlemix/examples/sam2">SAM2</a></li>
             <li><a href="paddlemix/examples/YOLO-World">YOLO-World</a></li>
       </ul>
       </ul>

diff --git a/build_env.sh b/build_env.sh
@@ -21,16 +21,16 @@ echo "开始安装 PaddleMIX 及其依赖..."
 
 # 安装 PaddleMIX
 echo "安装 PaddleMIX..."
-pip install -e .
+pip install -e . --user
 
 # 安装 ppdiffusers
 echo "安装 ppdiffusers..."
 cd ppdiffusers
-pip install -e .
+pip install -e . --user
 cd ..
 #注：ppdiffusers部分模型需要依赖 CUDA 11.2 及以上版本，如果本地机器不符合要求，建议前往 [AI Studio](https://aistudio.baidu.com/index) 进行模型训练、推理任务。
 #如果希望使用**bf16**训练推理，请使用支持**bf16**的GPU，如A100。
 
 # 安装依赖包
 echo "安装依赖包..."
-pip install -r requirements.txt
+pip install -r requirements.txt --user
diff --git a/deploy/ppdocbee/README.md b/deploy/ppdocbee/README.md
@@ -0,0 +1,49 @@
+# PP-DocBee
+
+## 1. 模型介绍
+
+PP-DocBee 是PaddleMIX团队自研的一款专注于文档理解的多模态大模型，在中文文档理解任务上具有卓越表现。该模型是基于Qwen2-VL-2B架构针对文档理解场景进行优化的，通过近 500 万条文档理解类多模态数据集进行微调优化，各种数据集包括了通用VQA类、OCR类、图表类、text-rich文档类、数学和复杂推理类、合成数据类、纯文本数据等，并设置了不同训练数据配比。在学术界权威的几个英文文档理解评测榜单上，PP-DocBee基本都达到了同参数量级别模型的SOTA。在内部业务中文场景类的指标上，PP-DocBee也高于目前的热门开源和闭源模型。
+
+## 2 环境准备
+
+- **python >= 3.10**
+- **paddlepaddle-gpu 要求>=3.0.0b2或版本develop**
+```
+# paddlepaddle-gpu develop版安装示例
+python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+```
+
+- **paddlenlp 需要特定版本**
+
+在PaddleMIX/代码目录下执行以下命令安装特定版本的paddlenlp：
+```bash
+# 安装示例
+git submodule update --init --recursive
+cd PaddleNLP
+git reset --hard e91c2d3d634b12769c30aa419ddf931c20b7ca9f
+pip install -e .
+cd csrc
+python setup_cuda.py install
+```
+
+> 注：
+* 请确保安装了以上依赖，否则无法运行。同时，需要安装 paddlemix/external_ops 下的自定义OP, `python setup.py install`。如果安装后仍然找不到算子，需要额外设置PYTHONPATH
+* (默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡
+
+## 3 高性能推理
+
+在PP-DocBee的高性能推理优化中，**视觉模型部分继续使用PaddleMIX中的模型组网；但是语言模型部分调用PaddleNLP中高性能的Qwen2语言模型**，以得到高性能的PP-DocBee推理版本。
+
+### 3.1. 文本&单张图像输入高性能推理
+```bash
+python deploy/ppdocbee/single_image_infer.py \
+    --model_name_or_path PaddleMIX/PPDocBee-2B-1129 \
+    --dtype bfloat16 \
+    --benchmark True \
+```
+
+- 在 NVIDIA A100-SXM4-80GB 上测试的内部业务中文场景平均端到端速度性能如下：
+
+| model                  | Paddle Inference|    PyTorch   | Paddle 动态图 |
+| ---------------------- | --------------- | ------------ | ------------ |
+| PPDocBee-2B   |      0.9267 s     |     1.7114 s   |    1.5935 s   |