-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072
[NEW MODEL CLIP] Add disco diffusion clip vitb32 #3072
Conversation
…JunnYu/PaddleNLP into add_disco_diffusion_clip_vitb32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for diffusion_generate process.
添加stable diffusion使用1、文本生成图片 Text-to-Image Generationimport paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration
model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")
# text2image
prompts = [
"In the morning light,Chinese ancient buildings in the mountains,Magnificent and fantastic John Howe landscape,lake,clouds,farm,Fairy tale,light effect,Dream,Greg Rutkowski,James Gurney,artstation",
"clouds surround the mountains and Chinese palaces,sunshine,lake,overlook,overlook,unreal engine,light effect,Dream,Greg Rutkowski,James Gurney,artstation",
"close-up maximalist illustration of panther, by makoto shinkai, akihiko yoshida, yoshitaka amano, super detailed, hd wallpaper, digital art",
"in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstationmakoto shinkai style",
]
all_images = []
for prompt in prompts:
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
seed = 42
image = model.generate(tokenized_inputs["input_ids"], mode="text2image", seed=seed)[0]
display(prompt)
display(image)
all_images.append(image) 2、文本+图片继续创作 Image-to-Image Text-Guided Generationimport paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration
model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")
def download_image(url):
import requests
from io import BytesIO
response = requests.get(url)
init_img = Image.open(BytesIO(response.content)).convert("RGB")
return init_img
prompt = "A fantasy landscape, trending on artstation"
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
init_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
init_image = download_image(init_image)
display(init_image)
seed = 42
image = model.generate(tokenized_inputs["input_ids"], mode="image2image", init_image=init_image, seed=seed, strength=0.75, guidance_scale=7.5)[0]
display(image) 3、补全图像的某个部分 Text-Guided Image Inpaintingimport paddle
paddle.set_device("gpu:5")
from PIL import Image
from IPython.display import display
from paddlenlp.transformers import CLIPModel, CLIPTokenizer, CLIPForImageGeneration
model = CLIPForImageGeneration.from_pretrained("CompVis/stable-diffusion-v1-4")
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4")
seed = 42
prompt = "a cat sitting on a bench"
init_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
mask_image = "http://bj.bcebos.com/paddlenlp/models/community/CompVis/stable-diffusion-v1-4/overture-creations-mask.png"
def download_image(url):
import requests
from io import BytesIO
response = requests.get(url)
init_img = Image.open(BytesIO(response.content)).convert("RGB")
return init_img
init_image = download_image(init_image)
mask_image = download_image(mask_image).convert("L")
tokenized_inputs = tokenizer(prompt, padding="max_length", trunction=True, max_length=tokenizer.model_max_length, return_tensors="pd")
image = model.generate(tokenized_inputs["input_ids"], mode="inpaint", init_image=init_image, mask_image=mask_image, seed=seed, strength=0.75)[0]
display("init_image ----------------------->")
display(init_image)
display("mask_image ----------------------->")
display(mask_image)
display("new_image ----------------------->")
display(image) |
attentions=encoder_outputs.attentions) | ||
|
||
|
||
class TextTransformer(CLIPPreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里能否跟HF一样也叫 CLIPTextTransformer 呢,vision的那个同样
INF = float("-inf") # -1e4 -1e9 | ||
|
||
|
||
class VisionTransformer(CLIPPreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是否也和HF一样使用 CLIPVisionTransformer
class VisionTransformer(CLIPPreTrainedModel): | ||
|
||
def __init__(self, | ||
input_resolution: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_resolution 是否使用 image_size 呢,看HF是用的image_size,哪种更常用更好些呢
Default to `10`. | ||
n_batches (int, optional): | ||
This variable sets the number of still images you want DD to create. If you are using an animation mode (see below for details) | ||
DD will ignore n_batches and create a single set of animated frames based on the animation settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
animation mode也是可以支持的参数吗
# eval mode and stop all param's gradient | ||
self.eval() | ||
for param in self.parameters(): | ||
param.stop_gradient = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉整个的stop_gradient加到这里会不会有点奇怪,和别的也有点不统一
images[0].save("figure.png") | ||
|
||
""" | ||
self.diffusion = create_gaussian_diffusion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个create_gaussian_diffusion是否也作为DiffusionMixin的成员方法呢
del init2 | ||
|
||
if init is not None and init_scale: | ||
lpips = try_import("paddle_lpips") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle_lpips这个是什么呢
PR types
New features
PR changes
Models
Description