PaddlePaddle · megemini · Dec 18, 2024 · Dec 22, 2024 · Dec 22, 2024 · Dec 22, 2024
diff --git a/paddlemix/datacopilot/example/iqa_filter/filter_example.ipynb b/paddlemix/datacopilot/example/iqa_filter/filter_example.ipynb
diff --git a/paddlemix/datacopilot/example/iqa_filter/llava_tmp_10.json b/paddlemix/datacopilot/example/iqa_filter/llava_tmp_10.json
diff --git a/paddlemix/datacopilot/example/iqa_filter/利用IQA方法过滤低质量图片.md b/paddlemix/datacopilot/example/iqa_filter/利用IQA方法过滤低质量图片.md
@@ -0,0 +1,111 @@
+# 利用 IQA 方法过滤低质量图片
+
+|            |                                |
+| ---------- | ------------------------------ |
+| 提交作者   | megemini(柳顺)                 |
+| 提交时间   | 2024-12-18                     |
+| RFC 版本号 | v1.0                           |
+| 文件名     | 利用 IQA 方法过滤低质量图片.md |
+
+## 1. 概述
+
+### 1.1 相关背景
+
+PaddleMIX是基于飞桨的多模态大模型开发套件，聚合图像、文本、视频等多种模态，覆盖视觉语言预训练，微调，文生图，文生视频，多模态理解等丰富的多模态任务。它提供开箱即用的开发体验，同时支持灵活定制，满足不同需求，助力探索通用人工智能。然而在实际项目开发过程中，各个领域的用户除了使用训练好的模型进行推理，也会使用专有数据微调来提升模型效果。
+
+本方案探索通过使用一系列的 IQA（Image Quality Assessment），即图像质量评估方法，对使用的数据进行过滤筛选，旨在通过过滤掉 `低质量` 的图片，从而减少训练数据，进而缩短训练或者验证模型的时间。
+
+本方案旨在从多模态大模型数据的领域开展工具组件的建设工作，完善飞桨多模态大模型套件数据分析和处理能力，减少用户开发成本。
+
+### 1.2 功能目标
+
+增加 PaddleMIX 的 `ops/filter` 目录，并在 `ops/filter` 目录中增加 IQA 方法。
+
+### 1.3 意义
+
+很多研究表明，数据质量的重要性要高于数据的数量，通过过滤掉 `低质量` 的图片，可以提升整体数据集的质量，并减轻训练模型的成本。
+
+## 2.  方案背景
+
+针对多模态大模型的数据预处理目前存在较多的方法，如，
+
+- 从数据中的文本分析，只保留文本长度在一定范围内的数据
+- 从数据中的图片分析，只保留图片长宽比例在一定范围内的数据
+- 从数据中的音频分析，只保留音频长度在一定范围内的数据
+- 从数据中的视频分析，只保留视频长度在一定范围内的数据
+
+其中，与本文方案相关的为 `从数据中的图片分析` 部分。
+
+目前使用较多的方法包括但不限于：
+
+- 只保留图片尺寸在一定范围内的数据
+- 只保留图片 shape 在一定范围内的数据
+- 只保留图片长宽比例在一定范围内的数据
+- 只保留图片中人脸的数量在一定范围内的数据
+- 只保留图片的 NSFW (Not Safe (or Suitable) For Work) 数值在以一定范围内的数据
+- 只保留图片的美学 (aesthetics) 数值在以一定范围内的数据
+
+本文的方案探索通过使用一系列的 IQA（Image Quality Assessment），即图像质量评估方法 （包括但不限于图片的清晰度、图片美学评分等），对使用的数据进行过滤筛选，旨在通过过滤掉 `低质量` 的图片。
+
+## 3. 目标调研
+
+IQA（Image Quality Assessment），即图像质量评估方法，包括但不限于
+
+- FR(全参考，Full Reference)
+- NR(无参考，No Reference/Blind)
+
+由于多模态数据大多不存在参考图片，因此，这里以 NR 方法为主要分析对象。
+
+目前的 NR 方法的流程基本为
+
+- 选定一部分数据
+- 训练有经验的人工对图片进行打分
+- 选定模型
+- 使用标注后的数据进行模型训练
+- 使用训练后的模型对新图片进行打分
+
+其中，模型又可以分为
+
+- 使用神经网络的模型结构，如 `Q-Align` `ARNIQA` 模型
+- 不使用神经网络的传统结构，如 `BRISQUE` 的 SVM 模型
+
+## 4. 设计思路与实现方案
+
+本文尝试在 PaddleMIX 中添加图片 `filter` 的相关 `op`，实现路径为
+
+- `paddlemix/datacopilot/ops/filter`
+
+在此目录下实现两个 IQA 过滤器
+
+- ARNIQA，神经网络模型
+- BRISQUE，SVM 模型
+
+`op` 实现之后，用户可以通过使用此 `op` 过滤数据集，只使用 IQA 分数在一定范围内（或大于某个阈值）的数据。
+
+由于目前大多神经网络相关的 IQA 模型，包括 `ARNIQA`，都是通过 PyTorch 实现的，因此，需要先将此类模型转换为 Paddle 模型再使用。
+
+这里使用 `X2Paddle` 进行模型的转换，转换的后的模型文件体积较大，需要单独上传到服务器中保存。
+
+### 4.1 补充说明[可选]
+
+- [ARNIQA: Learning Distortion Manifold for Image Quality Assessment](https://github.com/miccunifi/ARNIQA)
+- [BRISQUE](https://learnopencv.com/image-quality-assessment-brisque/)
+
+## 5. 测试和验收的考量
+
+通过设置 IQA 分数不同的范围或阈值，收集过滤后的数据集，并进行模型训练。
+
+比对过滤后的数据集所训练的模型，其精度与未过滤数据所训练的模型有何差异。
+
+## 6. 可行性分析和排期规划
+
+此处主要实现两个 IQA 模型
+
+- ARNIQA，一周
+- BRISQUE，一周
+
+模型训练与精度比对，两周
+
+## 7. 影响面
+
+本方案在 `paddlemix/datacopilot/ops/` 中会增加 `filter` 目录，并在其中增加 IQA 相关 `op` 。
diff --git a/paddlemix/datacopilot/nn/__init__.py b/paddlemix/datacopilot/nn/__init__.py
@@ -14,5 +14,5 @@
 
 
 from ._lid import FastTextLIDModel
+from .arniqa import ARNIQA
 from .inscaptagger import PPInsCapTagger
-
diff --git a/paddlemix/datacopilot/nn/arniqa/__init__.py b/paddlemix/datacopilot/nn/arniqa/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .arniqa import ARNIQA
diff --git a/paddlemix/datacopilot/nn/arniqa/arniqa.py b/paddlemix/datacopilot/nn/arniqa/arniqa.py
@@ -0,0 +1,99 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import annotations
+
+from pathlib import Path
+
+import paddle
+import paddle.nn.functional as F
+
+from .pd_model_encoder.x2paddle_code import Sequential as encoder_paddle_model
+from .pd_model_regressor.x2paddle_code import (
+    TorchLinearRegression as regressor_paddle_model,
+)
+
+
+class ARNIQA(paddle.nn.Layer):
+    """
+    ARNIQA: Learning Distortion Manifold for Image Quality Assessment
+
+    @inproceedings{agnolucci2024arniqa,
+      title={ARNIQA: Learning Distortion Manifold for Image Quality Assessment},
+      author={Agnolucci, Lorenzo and Galteri, Leonardo and Bertini, Marco and Del Bimbo, Alberto},
+      booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
+      pages={189--198},
+      year={2024}
+    }
+
+    Reference:
+        - Arxiv link: https://www.arxiv.org/abs/2310.14918
+        - Official Github: https://github.com/miccunifi/ARNIQA
+    """
+
+    def __init__(
+        self,
+        default_mean: tuple[float] = (0.485, 0.456, 0.406),
+        default_std: tuple[float] = (0.229, 0.224, 0.225),
+        feat_dim: int = 2048,
+    ):
+        super(ARNIQA, self).__init__()
+        self.default_mean = paddle.to_tensor(default_mean).view([1, 3, 1, 1])
+        self.default_std = paddle.to_tensor(default_std).view([1, 3, 1, 1])
+        self.feat_dim = feat_dim
+        self.encoder = encoder_paddle_model()
+        self.regressor = regressor_paddle_model()
+
+        encoder_paddle_params = paddle.load(str(Path(__file__).parent / "pd_model_encoder" / "model.pdparams"))
+        regressor_paddle_params = paddle.load(str(Path(__file__).parent / "pd_model_regressor" / "model.pdparams"))
+
+        self.encoder.set_dict(encoder_paddle_params, use_structured_name=True)
+        self.regressor.set_dict(regressor_paddle_params, use_structured_name=True)
+
+    def forward(self, x: paddle.Tensor) -> float:
+        x, x_ds = self._preprocess(x)
+
+        f = F.normalize(self.encoder(x), axis=1)
+        f_ds = F.normalize(self.encoder(x_ds), axis=1)
+        f_combined = paddle.hstack((f, f_ds)).reshape([-1, self.feat_dim * 2])
+
+        score = self.regressor(f_combined)
+        score = self._scale_score(score)
+
+        return score
+
+    def _preprocess(self, x: paddle.Tensor):
+        x_ds = F.interpolate(x, scale_factor=0.5, mode="bilinear", align_corners=False)
+        x = (x - self.default_mean) / self.default_std
+        x_ds = (x_ds - self.default_mean) / self.default_std
+        return x, x_ds
+
+    def _scale_score(self, score: float) -> float:
+        new_range = (0.0, 1.0)
+
+        # Compute scaling factors
+        original_range = (1, 100)
+        original_width = original_range[1] - original_range[0]
+        new_width = new_range[1] - new_range[0]
+        scaling_factor = new_width / original_width
+
+        # Scale score
+        scaled_score = new_range[0] + (score - original_range[0]) * scaling_factor
+
+        return scaled_score
+
+    def __call__(self, item: paddle.Tensor) -> float:
+        return self.forward(item)
+
+    def inference(self, item: paddle.Tensor) -> float:
+        return self.forward(item)
diff --git a/paddlemix/datacopilot/nn/arniqa/pd_model_encoder/__init__.py b/paddlemix/datacopilot/nn/arniqa/pd_model_encoder/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -14,5 +14,5 @@


		from ._lid import FastTextLIDModel
		from .arniqa import ARNIQA
		from .inscaptagger import PPInsCapTagger