Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072

Merged

Conversation

JunnYu
Copy link
Member

@JunnYu JunnYu commented Aug 17, 2022

PR types

New features

PR changes

Models

Description

  1. 新增CLIPModel、CLIPTextModel、CLIPVisionModel
  2. CLIPModel + DD生成图片。 V100 32G 10分钟一张
  3. CLIPModel + StableDiffusion 的3种文图生成模式。V100 32G 7秒一张
  • Text-to-Image Generation
  • Image-to-Image Text-Guided Generation
  • Text-Guided Image Inpainting
import paddle
paddle.set_device("gpu:5")
from paddlenlp.transformers import CLIPForImageGeneration, CLIPTokenizer

# Initialize the model and tokenizer
model_name_or_path = 'openai/disco-diffusion-clip-vit-base-patch32'
model = CLIPForImageGeneration.from_pretrained(model_name_or_path)
tokenizer = CLIPTokenizer.from_pretrained(model_name_or_path)
model.eval()

# Prepare the model inputs.
prompts = "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
tokenizer.pad_token_id = 0
tokenized_inputs = tokenizer(prompts, return_tensors="pd", padding="max_length", max_length=77)

images = model.generate(**tokenized_inputs)
# return List[PIL.Image]
images[0].save("figure.png")

下载

rainyfly
rainyfly previously approved these changes Aug 24, 2022
Copy link

@rainyfly rainyfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for diffusion_generate process.

@JunnYu
Copy link
Member Author

JunnYu commented Sep 1, 2022

添加stable diffusion使用

1、文本生成图片 Text-to-Image Generation

import paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration

model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")

# text2image
prompts = [
    "In the morning light,Chinese ancient buildings in the mountains,Magnificent and fantastic John Howe landscape,lake,clouds,farm,Fairy tale,light effect,Dream,Greg Rutkowski,James Gurney,artstation",
    "clouds surround the mountains and Chinese palaces,sunshine,lake,overlook,overlook,unreal engine,light effect,Dream,Greg Rutkowski,James Gurney,artstation",
    "close-up maximalist illustration of panther, by makoto shinkai, akihiko yoshida, yoshitaka amano, super detailed, hd wallpaper, digital art",
    "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstationmakoto shinkai style",
]
all_images = []
for prompt in prompts:
    tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
    seed = 42
    image = model.generate(tokenized_inputs["input_ids"], mode="text2image", seed=seed)[0]
    display(prompt)
    display(image)
    all_images.append(image)

f_0
f_1
f_2
f_3

2、文本+图片继续创作 Image-to-Image Text-Guided Generation

import paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration

model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")

def download_image(url):
    import requests
    from io import BytesIO
    response = requests.get(url)
    init_img = Image.open(BytesIO(response.content)).convert("RGB")
    return init_img

prompt = "A fantasy landscape, trending on artstation"
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
init_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
init_image = download_image(init_image)
display(init_image)
seed = 42
image = model.generate(tokenized_inputs["input_ids"], mode="image2image", init_image=init_image, seed=seed, strength=0.75, guidance_scale=7.5)[0]
display(image) 

init_image
new

3、补全图像的某个部分 Text-Guided Image Inpainting

import paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration

model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")

seed = 42
prompt = "a cat sitting on a bench"
init_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
mask_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"

def download_image(url):
    import requests
    from io import BytesIO
    response = requests.get(url)
    init_img = Image.open(BytesIO(response.content)).convert("RGB")
    return init_img

init_image = download_image(init_image)
mask_image = download_image(mask_image).convert("L")
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
image = model.generate(tokenized_inputs["input_ids"], mode="inpaint", init_image=init_image, mask_image=mask_image, seed=seed, strength=0.75)[0]
display("init_image ----------------------->")
display(init_image)
display("mask_image ----------------------->")
display(mask_image)
display("new_image ----------------------->")
display(image)

init
mask
new (2)

@JunnYu JunnYu requested a review from guoshengCS September 1, 2022 04:57
guoshengCS
guoshengCS previously approved these changes Sep 2, 2022
attentions=encoder_outputs.attentions)


class TextTransformer(CLIPPreTrainedModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里能否跟HF一样也叫 CLIPTextTransformer 呢,vision的那个同样

INF = float("-inf") # -1e4 -1e9


class VisionTransformer(CLIPPreTrainedModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是否也和HF一样使用 CLIPVisionTransformer

class VisionTransformer(CLIPPreTrainedModel):

def __init__(self,
input_resolution: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_resolution 是否使用 image_size 呢,看HF是用的image_size,哪种更常用更好些呢

Default to `10`.
n_batches (int, optional):
This variable sets the number of still images you want DD to create. If you are using an animation mode (see below for details)
DD will ignore n_batches and create a single set of animated frames based on the animation settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

animation mode也是可以支持的参数吗

# eval mode and stop all param's gradient
self.eval()
for param in self.parameters():
param.stop_gradient = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉整个的stop_gradient加到这里会不会有点奇怪,和别的也有点不统一

images[0].save("figure.png")

"""
self.diffusion = create_gaussian_diffusion(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个create_gaussian_diffusion是否也作为DiffusionMixin的成员方法呢

del init2

if init is not None and init_scale:
lpips = try_import("paddle_lpips")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle_lpips这个是什么呢

@guoshengCS guoshengCS merged commit d789b4f into PaddlePaddle:develop Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants