[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072

JunnYu · 2022-08-17T06:46:23Z

PR types

New features

PR changes

Models

Description

新增CLIPModel、CLIPTextModel、CLIPVisionModel
CLIPModel + DD生成图片。 V100 32G 10分钟一张
CLIPModel + StableDiffusion 的3种文图生成模式。V100 32G 7秒一张

Text-to-Image Generation
Image-to-Image Text-Guided Generation
Text-Guided Image Inpainting

import paddle
paddle.set_device("gpu:5")
from paddlenlp.transformers import CLIPForImageGeneration, CLIPTokenizer

# Initialize the model and tokenizer
model_name_or_path = 'openai/disco-diffusion-clip-vit-base-patch32'
model = CLIPForImageGeneration.from_pretrained(model_name_or_path)
tokenizer = CLIPTokenizer.from_pretrained(model_name_or_path)
model.eval()

# Prepare the model inputs.
prompts = "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
tokenizer.pad_token_id = 0
tokenized_inputs = tokenizer(prompts, return_tensors="pd", padding="max_length", max_length=77)

images = model.generate(**tokenized_inputs)
# return List[PIL.Image]
images[0].save("figure.png")

…JunnYu/PaddleNLP into add_disco_diffusion_clip_vitb32

paddlenlp/transformers/clip/feature_extraction.py

paddlenlp/transformers/clip/modeling.py

rainyfly

LGTM for diffusion_generate process.

JunnYu · 2022-09-01T02:50:50Z

添加stable diffusion使用

1、文本生成图片 Text-to-Image Generation

import paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration

model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")

# text2image
prompts = [
    "In the morning light,Chinese ancient buildings in the mountains,Magnificent and fantastic John Howe landscape,lake,clouds,farm,Fairy tale,light effect,Dream,Greg Rutkowski,James Gurney,artstation",
    "clouds surround the mountains and Chinese palaces,sunshine,lake,overlook,overlook,unreal engine,light effect,Dream，Greg Rutkowski,James Gurney,artstation",
    "close-up maximalist illustration of panther, by makoto shinkai, akihiko yoshida, yoshitaka amano, super detailed, hd wallpaper, digital art",
    "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstationmakoto shinkai style",
]
all_images = []
for prompt in prompts:
    tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
    seed = 42
    image = model.generate(tokenized_inputs["input_ids"], mode="text2image", seed=seed)[0]
    display(prompt)
    display(image)
    all_images.append(image)

2、文本+图片继续创作 Image-to-Image Text-Guided Generation

import paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration

model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")

def download_image(url):
    import requests
    from io import BytesIO
    response = requests.get(url)
    init_img = Image.open(BytesIO(response.content)).convert("RGB")
    return init_img

prompt = "A fantasy landscape, trending on artstation"
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
init_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
init_image = download_image(init_image)
display(init_image)
seed = 42
image = model.generate(tokenized_inputs["input_ids"], mode="image2image", init_image=init_image, seed=seed, strength=0.75, guidance_scale=7.5)[0]
display(image)

3、补全图像的某个部分 Text-Guided Image Inpainting

import paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration

model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")

seed = 42
prompt = "a cat sitting on a bench"
init_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
mask_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"

def download_image(url):
    import requests
    from io import BytesIO
    response = requests.get(url)
    init_img = Image.open(BytesIO(response.content)).convert("RGB")
    return init_img

init_image = download_image(init_image)
mask_image = download_image(mask_image).convert("L")
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
image = model.generate(tokenized_inputs["input_ids"], mode="inpaint", init_image=init_image, mask_image=mask_image, seed=seed, strength=0.75)[0]
display("init_image ----------------------->")
display(init_image)
display("mask_image ----------------------->")
display(mask_image)
display("new_image ----------------------->")
display(image)

guoshengCS · 2022-08-25T10:45:16Z

paddlenlp/transformers/clip/modeling.py

+            attentions=encoder_outputs.attentions)
+
+
+class TextTransformer(CLIPPreTrainedModel):


这里能否跟HF一样也叫 CLIPTextTransformer 呢，vision的那个同样

guoshengCS · 2022-08-25T11:45:13Z

paddlenlp/transformers/clip/modeling.py

+INF = float("-inf")  # -1e4 -1e9
+
+
+class VisionTransformer(CLIPPreTrainedModel):


这个是否也和HF一样使用 CLIPVisionTransformer

guoshengCS · 2022-08-25T11:47:33Z

paddlenlp/transformers/clip/modeling.py

+class VisionTransformer(CLIPPreTrainedModel):
+
+    def __init__(self,
+                 input_resolution: int,


input_resolution 是否使用 image_size 呢，看HF是用的image_size，哪种更常用更好些呢

guoshengCS · 2022-08-26T06:11:51Z

paddlenlp/transformers/clip/modeling.py

+                Default to `10`.
+            n_batches (int, optional):
+                This variable sets the number of still images you want DD to create.  If you are using an animation mode (see below for details)
+                DD will ignore n_batches and create a single set of animated frames based on the animation settings.


animation mode也是可以支持的参数吗

guoshengCS · 2022-08-26T06:34:32Z

paddlenlp/transformers/clip/modeling.py

+        # eval mode and stop all param's gradient
+        self.eval()
+        for param in self.parameters():
+            param.stop_gradient = True


感觉整个的stop_gradient加到这里会不会有点奇怪，和别的也有点不统一

guoshengCS · 2022-08-29T06:32:41Z

paddlenlp/transformers/clip/modeling.py

+            images[0].save("figure.png")
+
+        """
+        self.diffusion = create_gaussian_diffusion(


这个create_gaussian_diffusion是否也作为DiffusionMixin的成员方法呢

guoshengCS · 2022-08-29T11:44:40Z

paddlenlp/transformers/guided_diffusion_utils/diffusion_utils.py

+            del init2
+
+        if init is not None and init_scale:
+            lpips = try_import("paddle_lpips")


paddle_lpips这个是什么呢

JunnYu added 17 commits August 9, 2022 15:57

add clip model

6a069e0

update params name

7678e73

add docs

412c269

merge

ca0c91d

typo

c102a01

update clip and add clip+dd

2e602ba

Merge branch 'develop' into add_disco_diffusion_clip_vitb32

c492b51

add vision_heads and vision_mlp_ratio

11887cd

typo and update docs

6655ba5

update default args and add resnet 50 101 and use old pillow

6738ea5

Merge branch 'develop' into add_disco_diffusion_clip_vitb32

2e28b2e

add resnet50 101 and update tokenizer

4ead523

update seed

32b18cd

Merge branch 'develop' into add_disco_diffusion_clip_vitb32

23ce6a8

add __init__.py

2747f60

Merge branch 'add_disco_diffusion_clip_vitb32' of https://github.com/…

8e5ccf6

…JunnYu/PaddleNLP into add_disco_diffusion_clip_vitb32

add openai/clip-vit-large-patch14 and update imagegeneration demo

9d9b9e9

guoshengCS reviewed Aug 24, 2022

View reviewed changes

paddlenlp/transformers/clip/feature_extraction.py Show resolved Hide resolved

paddlenlp/transformers/clip/modeling.py Show resolved Hide resolved

rainyfly previously approved these changes Aug 24, 2022

View reviewed changes

JunnYu added 2 commits August 30, 2022 09:50

Merge branch 'PaddlePaddle:develop' into add_disco_diffusion_clip_vitb32

5b122bf

update attention mask and update copyright

f8d38cf

JunnYu dismissed rainyfly’s stale review via f8d38cf August 30, 2022 02:15

JunnYu added 3 commits August 31, 2022 22:41

add stable diffusion and docs & add CLIPTextModel and CLIPVisionModel

34a129f

del clip.vision_model

0dae240

update PNDMScheduler and add set_scheduler

94ed007

JunnYu requested a review from guoshengCS September 1, 2022 04:57

update copyright order

0693228

guoshengCS previously approved these changes Sep 2, 2022

View reviewed changes

Merge branch 'develop' into add_disco_diffusion_clip_vitb32

f7916a1

JunnYu dismissed guoshengCS’s stale review via f7916a1 September 2, 2022 10:04

guoshengCS approved these changes Sep 2, 2022

View reviewed changes

guoshengCS merged commit d789b4f into PaddlePaddle:develop Sep 2, 2022

JunnYu mentioned this pull request Sep 5, 2022

PaddleNLP 2.4.0 Release Note Candidate #3190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072

[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072

JunnYu commented Aug 17, 2022 •

edited

Loading

rainyfly left a comment

JunnYu commented Sep 1, 2022 •

edited

Loading

guoshengCS Aug 25, 2022

guoshengCS Aug 25, 2022

guoshengCS Aug 25, 2022

guoshengCS Aug 26, 2022

guoshengCS Aug 26, 2022

guoshengCS Aug 29, 2022

guoshengCS Aug 29, 2022

		attentions=encoder_outputs.attentions)


		class TextTransformer(CLIPPreTrainedModel):

		INF = float("-inf") # -1e4 -1e9


		class VisionTransformer(CLIPPreTrainedModel):

[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072

[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072

Conversation

JunnYu commented Aug 17, 2022 • edited Loading

PR types

PR changes

Description

rainyfly left a comment

Choose a reason for hiding this comment

JunnYu commented Sep 1, 2022 • edited Loading

添加stable diffusion使用

1、文本生成图片 Text-to-Image Generation

2、文本+图片继续创作 Image-to-Image Text-Guided Generation

3、补全图像的某个部分 Text-Guided Image Inpainting

guoshengCS Aug 25, 2022

Choose a reason for hiding this comment

guoshengCS Aug 25, 2022

Choose a reason for hiding this comment

guoshengCS Aug 25, 2022

Choose a reason for hiding this comment

guoshengCS Aug 26, 2022

Choose a reason for hiding this comment

guoshengCS Aug 26, 2022

Choose a reason for hiding this comment

guoshengCS Aug 29, 2022

Choose a reason for hiding this comment

guoshengCS Aug 29, 2022

Choose a reason for hiding this comment

JunnYu commented Aug 17, 2022 •

edited

Loading

JunnYu commented Sep 1, 2022 •

edited

Loading