Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[飞桨多模态大模型套件PaddleMIX开发大赛] rfc & code #890

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,309 changes: 1,309 additions & 0 deletions paddlemix/datacopilot/example/iqa_filter/filter_example.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions paddlemix/datacopilot/example/iqa_filter/llava_tmp_10.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# 利用 IQA 方法过滤低质量图片

| | |
| ---------- | ------------------------------ |
| 提交作者 | megemini(柳顺) |
| 提交时间 | 2024-12-18 |
| RFC 版本号 | v1.0 |
| 文件名 | 利用 IQA 方法过滤低质量图片.md |

## 1. 概述

### 1.1 相关背景

PaddleMIX是基于飞桨的多模态大模型开发套件,聚合图像、文本、视频等多种模态,覆盖视觉语言预训练,微调,文生图,文生视频,多模态理解等丰富的多模态任务。它提供开箱即用的开发体验,同时支持灵活定制,满足不同需求,助力探索通用人工智能。然而在实际项目开发过程中,各个领域的用户除了使用训练好的模型进行推理,也会使用专有数据微调来提升模型效果。

本方案探索通过使用一系列的 IQA(Image Quality Assessment),即图像质量评估方法,对使用的数据进行过滤筛选,旨在通过过滤掉 `低质量` 的图片,从而减少训练数据,进而缩短训练或者验证模型的时间。

本方案旨在从多模态大模型数据的领域开展工具组件的建设工作,完善飞桨多模态大模型套件数据分析和处理能力,减少用户开发成本。

### 1.2 功能目标

增加 PaddleMIX 的 `ops/filter` 目录,并在 `ops/filter` 目录中增加 IQA 方法。

### 1.3 意义

很多研究表明,数据质量的重要性要高于数据的数量,通过过滤掉 `低质量` 的图片,可以提升整体数据集的质量,并减轻训练模型的成本。

## 2. 方案背景

针对多模态大模型的数据预处理目前存在较多的方法,如,

- 从数据中的文本分析,只保留文本长度在一定范围内的数据
- 从数据中的图片分析,只保留图片长宽比例在一定范围内的数据
- 从数据中的音频分析,只保留音频长度在一定范围内的数据
- 从数据中的视频分析,只保留视频长度在一定范围内的数据

其中,与本文方案相关的为 `从数据中的图片分析` 部分。

目前使用较多的方法包括但不限于:

- 只保留图片尺寸在一定范围内的数据
- 只保留图片 shape 在一定范围内的数据
- 只保留图片长宽比例在一定范围内的数据
- 只保留图片中人脸的数量在一定范围内的数据
- 只保留图片的 NSFW (Not Safe (or Suitable) For Work) 数值在以一定范围内的数据
- 只保留图片的美学 (aesthetics) 数值在以一定范围内的数据

本文的方案探索通过使用一系列的 IQA(Image Quality Assessment),即图像质量评估方法 (包括但不限于图片的清晰度、图片美学评分等),对使用的数据进行过滤筛选,旨在通过过滤掉 `低质量` 的图片。

## 3. 目标调研

IQA(Image Quality Assessment),即图像质量评估方法,包括但不限于

- FR(全参考,Full Reference)
- NR(无参考,No Reference/Blind)

由于多模态数据大多不存在参考图片,因此,这里以 NR 方法为主要分析对象。

目前的 NR 方法的流程基本为

- 选定一部分数据
- 训练有经验的人工对图片进行打分
- 选定模型
- 使用标注后的数据进行模型训练
- 使用训练后的模型对新图片进行打分

其中,模型又可以分为

- 使用神经网络的模型结构,如 `Q-Align` `ARNIQA` 模型
- 不使用神经网络的传统结构,如 `BRISQUE` 的 SVM 模型

## 4. 设计思路与实现方案

本文尝试在 PaddleMIX 中添加图片 `filter` 的相关 `op`,实现路径为

- `paddlemix/datacopilot/ops/filter`

在此目录下实现两个 IQA 过滤器

- ARNIQA,神经网络模型
- BRISQUE,SVM 模型

`op` 实现之后,用户可以通过使用此 `op` 过滤数据集,只使用 IQA 分数在一定范围内(或大于某个阈值)的数据。

由于目前大多神经网络相关的 IQA 模型,包括 `ARNIQA`,都是通过 PyTorch 实现的,因此,需要先将此类模型转换为 Paddle 模型再使用。

这里使用 `X2Paddle` 进行模型的转换,转换的后的模型文件体积较大,需要单独上传到服务器中保存。

### 4.1 补充说明[可选]

- [ARNIQA: Learning Distortion Manifold for Image Quality Assessment](https://github.com/miccunifi/ARNIQA)
- [BRISQUE](https://learnopencv.com/image-quality-assessment-brisque/)

## 5. 测试和验收的考量

通过设置 IQA 分数不同的范围或阈值,收集过滤后的数据集,并进行模型训练。

比对过滤后的数据集所训练的模型,其精度与未过滤数据所训练的模型有何差异。

## 6. 可行性分析和排期规划

此处主要实现两个 IQA 模型

- ARNIQA,一周
- BRISQUE,一周

模型训练与精度比对,两周

## 7. 影响面

本方案在 `paddlemix/datacopilot/ops/` 中会增加 `filter` 目录,并在其中增加 IQA 相关 `op` 。
2 changes: 1 addition & 1 deletion paddlemix/datacopilot/nn/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@


from ._lid import FastTextLIDModel
from .arniqa import ARNIQA
from .inscaptagger import PPInsCapTagger

15 changes: 15 additions & 0 deletions paddlemix/datacopilot/nn/arniqa/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .arniqa import ARNIQA
99 changes: 99 additions & 0 deletions paddlemix/datacopilot/nn/arniqa/arniqa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations

from pathlib import Path

import paddle
import paddle.nn.functional as F

from .pd_model_encoder.x2paddle_code import Sequential as encoder_paddle_model
from .pd_model_regressor.x2paddle_code import (
TorchLinearRegression as regressor_paddle_model,
)


class ARNIQA(paddle.nn.Layer):
"""
ARNIQA: Learning Distortion Manifold for Image Quality Assessment

@inproceedings{agnolucci2024arniqa,
title={ARNIQA: Learning Distortion Manifold for Image Quality Assessment},
author={Agnolucci, Lorenzo and Galteri, Leonardo and Bertini, Marco and Del Bimbo, Alberto},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={189--198},
year={2024}
}

Reference:
- Arxiv link: https://www.arxiv.org/abs/2310.14918
- Official Github: https://github.com/miccunifi/ARNIQA
"""

def __init__(
self,
default_mean: tuple[float] = (0.485, 0.456, 0.406),
default_std: tuple[float] = (0.229, 0.224, 0.225),
feat_dim: int = 2048,
):
super(ARNIQA, self).__init__()
self.default_mean = paddle.to_tensor(default_mean).view([1, 3, 1, 1])
self.default_std = paddle.to_tensor(default_std).view([1, 3, 1, 1])
self.feat_dim = feat_dim
self.encoder = encoder_paddle_model()
self.regressor = regressor_paddle_model()

encoder_paddle_params = paddle.load(str(Path(__file__).parent / "pd_model_encoder" / "model.pdparams"))
regressor_paddle_params = paddle.load(str(Path(__file__).parent / "pd_model_regressor" / "model.pdparams"))

self.encoder.set_dict(encoder_paddle_params, use_structured_name=True)
self.regressor.set_dict(regressor_paddle_params, use_structured_name=True)

def forward(self, x: paddle.Tensor) -> float:
x, x_ds = self._preprocess(x)

f = F.normalize(self.encoder(x), axis=1)
f_ds = F.normalize(self.encoder(x_ds), axis=1)
f_combined = paddle.hstack((f, f_ds)).reshape([-1, self.feat_dim * 2])

score = self.regressor(f_combined)
score = self._scale_score(score)

return score

def _preprocess(self, x: paddle.Tensor):
x_ds = F.interpolate(x, scale_factor=0.5, mode="bilinear", align_corners=False)
x = (x - self.default_mean) / self.default_std
x_ds = (x_ds - self.default_mean) / self.default_std
return x, x_ds

def _scale_score(self, score: float) -> float:
new_range = (0.0, 1.0)

# Compute scaling factors
original_range = (1, 100)
original_width = original_range[1] - original_range[0]
new_width = new_range[1] - new_range[0]
scaling_factor = new_width / original_width

# Scale score
scaled_score = new_range[0] + (score - original_range[0]) * scaling_factor

return scaled_score

def __call__(self, item: paddle.Tensor) -> float:
return self.forward(item)

def inference(self, item: paddle.Tensor) -> float:
return self.forward(item)
13 changes: 13 additions & 0 deletions paddlemix/datacopilot/nn/arniqa/pd_model_encoder/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Loading