-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Modular Diffusers #9672
base: main
Are you sure you want to change the base?
The Modular Diffusers #9672
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Very cool! |
… completely yet (look into later)
hi this is very interesting! I'm making a Python pipeline flow visual scripting tool, that can auto-convert functions to visual nodes for fast and modular UI blocks demo. Itself is a pip package: https://pypi.org/project/nozyio/ I wanted to integrate diffusers with my flow nodes UI project but found its not very modular. But this PR may change that! Looking forward to see how this evolves. github: https://github.com/oozzy77/nozyio happy to connect! |
@oozzy77 thanks! |
Hi super willing to join slack channel with you! What’s the workspace
channel I should join?or you can invite me ***@***.***
…On Thu, Oct 31, 2024 at 4:59 AM YiYi Xu ***@***.***> wrote:
@oozzy77 <https://github.com/oozzy77> thanks!
do you want to join a slack channel with me? if you want to experiment
building something with this PR I'm eager to hear your feedback and iterate
base on that
—
Reply to this email directly, view it on GitHub
<#9672 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BMBK3ZHNSKN56N262LBH3WLZ6FCBNAVCNFSM6AAAAABP5SYMXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBYGM3DQMBYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@oozzy77 I sent an invite! |
testing script to use from the latest commit (will keep this one up to date from now on) cc @hlky @asomoza testing script for modular diffusers (most updated)import os
import torch
import numpy as np
import cv2
from PIL import Image
from diffusers import (
ControlNetModel,
ModularPipeline,
UNet2DConditionModel,
AutoencoderKL,
ControlNetUnionModel,
)
from diffusers.utils import load_image
from diffusers.guider import PAGGuider, CFGGuider, APGGuider
from diffusers.pipelines.modular_pipeline import SequentialPipelineBlocks
from diffusers.pipelines.components_manager import ComponentsManager
from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_modular import AUTO_BLOCKS, IMAGE2IMAGE_BLOCKS
from controlnet_aux import LineartAnimeDetector
import logging
logging.getLogger().setLevel(logging.INFO)
logging.getLogger("diffusers").setLevel(logging.INFO)
# define device and dtype
device = "cuda:0"
dtype = torch.float16
# define output folder
out_folder = "modular_test_outputs_0110"
os.makedirs(out_folder, exist_ok=True)
# functions for memory info
def reset_memory():
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()
def clear_memory():
torch.cuda.empty_cache()
def print_mem(mem_size, name):
mem_gb = mem_size / 1024**3
mem_mb = mem_size / 1024**2
print(f"- {name}: {mem_gb:.2f} GB ({mem_mb:.2f} MB)")
def print_memory(message=None):
"""
Print detailed GPU memory statistics for a specific device.
Args:
device_id (int): GPU device ID
"""
allocated_mem = torch.cuda.memory_allocated(device)
reserved_mem = torch.cuda.memory_reserved(device)
mem_on_device = torch.cuda.mem_get_info(device)[0]
peak_mem = torch.cuda.max_memory_allocated(device)
print(f"\nGPU:{device} Memory Status {message}:")
print_mem(allocated_mem, "allocated memory")
print_mem(reserved_mem, "reserved memory")
print_mem(peak_mem, "peak memory")
print_mem(mem_on_device, "mem on device")
# function to make canny image (for controlnet)
def make_canny(image):
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
return Image.fromarray(image)
# (1)Define inputs
# for text2img/img2img
prompt = "a photo of an astronaut riding a horse on mars"
# for img2img
url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"
init_image = load_image(url).convert("RGB")
strength = 0.9
# for controlnet
control_image = make_canny(init_image)
controlnet_conditioning_scale = 0.5 # recommended for good generalization
# for controlnet_union
processor = LineartAnimeDetector.from_pretrained("lllyasviel/Annotators")
controlnet_union_image = processor(init_image, output_type="pil")
# for inpainting
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
inpaint_image = load_image(img_url).resize((1024, 1024))
inpaint_mask = load_image(mask_url).resize((1024, 1024))
inpaint_control_image = make_canny(inpaint_image)
inpaint_strength = 0.99
# (2) define blocks and nodes(builder)
all_blocks_map = AUTO_BLOCKS.copy()
# text block
text_block = all_blocks_map.pop("text_encoder")()
# image encoder block
image_encoder_block = all_blocks_map.pop("image_encoder")()
# decoder block
decoder_block = all_blocks_map.pop("decode")()
class SDXLAutoBlocks(SequentialPipelineBlocks):
block_classes = list(all_blocks_map.values())
block_names = list(all_blocks_map.keys())
# sdxl main block
sdxl_auto_blocks = SDXLAutoBlocks()
image2image_blocks_map = IMAGE2IMAGE_BLOCKS.copy()
# we do not need image_encoder for refiner becuase it takes image_latents (from another pipeline) as input
_ = image2image_blocks_map.pop("image_encoder")()
# refiner block
class RefinerSteps(SequentialPipelineBlocks):
block_classes = list(image2image_blocks_map.values())
block_names = list(image2image_blocks_map.keys())
refiner_block = RefinerSteps()
text_node = ModularPipeline.from_block(text_block)
image_node = ModularPipeline.from_block(image_encoder_block)
sdxl_node = ModularPipeline.from_block(sdxl_auto_blocks)
decoder_node = ModularPipeline.from_block(decoder_block)
refiner_node = ModularPipeline.from_block(refiner_block)
# (3) add states to nodes
repo = "stabilityai/stable-diffusion-xl-base-1.0"
refiner_repo = "stabilityai/stable-diffusion-xl-refiner-1.0"
controlnet_repo = "diffusers/controlnet-canny-sdxl-1.0"
inpaint_repo = "diffusers/stable-diffusion-xl-1.0-inpainting-0.1"
vae_fix_repo = "madebyollin/sdxl-vae-fp16-fix"
controlnet_union_repo = "brad-twinkl/controlnet-union-sdxl-1.0-promax"
components = ComponentsManager()
components.add_from_pretrained(repo, torch_dtype=dtype)
controlnet = ControlNetModel.from_pretrained(controlnet_repo, torch_dtype=dtype)
refiner_unet = UNet2DConditionModel.from_pretrained(refiner_repo, subfolder="unet", torch_dtype=dtype)
inpaint_unet = UNet2DConditionModel.from_pretrained(inpaint_repo, subfolder="unet", torch_dtype=dtype)
vae_fix = AutoencoderKL.from_pretrained(vae_fix_repo, torch_dtype=dtype)
controlnet_union = ControlNetUnionModel.from_pretrained(controlnet_union_repo, torch_dtype=dtype)
components.add("controlnet", controlnet)
components.add("refiner_unet", refiner_unet)
components.add("inpaint_unet", inpaint_unet)
components.add("controlnet_union", controlnet_union)
components.add("vae_fix", vae_fix)
# you can add guiders to manager too but no need because it was not serialized
pag_guider = PAGGuider(pag_applied_layers="mid")
controlnet_pag_guider = PAGGuider(pag_applied_layers="mid")
# load components/config into nodes
text_node.update_states(**components.get(["text_encoder", "text_encoder_2", "tokenizer", "tokenizer_2"]))
image_node.update_states(**components.get(["vae"]))
decoder_node.update_states(vae=components.get("vae"))
sdxl_node.update_states(**components.get(["unet", "scheduler", "vae", "controlnet"]))
refiner_node.update_states(**components.get(["text_encoder_2","tokenizer_2", "vae", "scheduler"]), unet=components.get("refiner_unet"), force_zeros_for_empty_prompt=True, requires_aesthetics_score=True)
# (4) enable auto cpu offload: automatically offload models when available gpu memory go below a certain threshold
components.enable_auto_cpu_offload(device=device)
print(components)
reset_memory()
# (5) run the workflows
print(f" ")
print(f" text_node:")
print(text_node)
print(f" ")
print(f" generating text embeddings with text_node")
# using text_node to generate text embeddings
text_state = text_node(prompt=prompt)
print(" ")
print(f" components info after run text_node: text_encoder and text_encoder_2 are on device")
print(components)
print(f" ")
print(f" text_state info")
print(text_state)
print(" ")
# using sdxl_node to generate images
# to get info about sdxl_node and how to use it: inputs/outputs/components
# this is an "auto" workflow that works for all use cases: text2img, img2img, inpainting, controlnet, etc.
# so the information might not be super useful for your specific use case, you will find a "trigger inputs" section says this
# Trigger Inputs:
# --------------
# This pipeline contains dynamic blocks that are selected at runtime based on your inputs.
# • Trigger inputs: {'control_image', 'image_latents', 'mask'}
# • Use .pipeline_block.get_triggered_blocks(*inputs) to see which blocks will be used for specific inputs
# • Use .pipeline_block.get_triggered_blocks() to see blocks will be used for default inputs (when no trigger inputs are provided)
print(f" ")
print(f" sdxl_node:")
print(sdxl_node)
print(" ")
# since we want to use text2img use case, we can run the following to see components/blocks/inputs for this use case
print(f" ")
print(f" sdxl_node info (default use case: text2img)")
print(sdxl_node.pipeline_block.get_triggered_blocks())
print(" ")
# test1: text2img use case
# when you run the auto workflow, you will get these logs telling you which blocks are actuallyrunning
# (should match what the sdxl_node told you)
# Running block: StableDiffusionXLBeforeDenoiseStep, trigger: None
# Running block: StableDiffusionXLDenoiseStep, trigger: None
# Running block: StableDiffusionXLDecodeStep, trigger: None
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test1_out_text2img.png")
print(f" save modular output to {out_folder}/test1_out_text2img.png")
clear_memory()
# test2: text2img with lora use case
print(f" ")
print(f" running test2: text2img with lora use case")
generator = torch.Generator(device="cuda").manual_seed(0)
sdxl_node.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors")
latents = sdxl_node(
**text_state.intermediates,
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test2_out_text2img_lora.png")
print(f" save modular output to {out_folder}/test2_out_text2img_lora.png")
# test3:text2image without lora again, with pag
print(f" ")
print(f" running test3:text2image without lora again, with pag")
sdxl_node.unload_lora_weights()
sdxl_node.update_states(guider=pag_guider, controlnet_guider=controlnet_pag_guider)
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
generator=generator,
guider_kwargs={"pag_scale": 3.0},
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test3_out_text2img_pag.png")
print(f" save modular output to {out_folder}/test3_out_text2img_pag.png")
clear_memory()
# checkout the components if you want, the models used is moved to devicem some might get offloaded to cpu
# print(components)
# test4: SDXL(text2img) with controlnet
# we are going to pass a new input now `control_image` so the workflow will be automatically converted to controlnet use case
# let's checkout the info for controlnet use case
print(f" sdxl_node info (controlnet use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("control_image"))
print(" ")
print(f" ")
print(f" running test4: SDXL(text2img) with controlnet")
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
control_image=control_image,
controlnet_conditioning_scale=controlnet_conditioning_scale,
guider_kwargs={"pag_scale": 3.0},
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test4_out_text2img_control.png")
print(f" save modular output to {out_folder}/test4_out_text2img_control.png")
clear_memory()
# test5: SDXL(img2img)
# for img2img use case, we encode the image with image_node first, this way we can use the same image_latents for different workflows
# let's checkout the image_node
print(f" image_node info")
print(image_node)
print(" ")
print(f" ")
print(f" running test5: SDXL(img2img)")
generator = torch.Generator(device="cuda").manual_seed(0)
image_state = image_node(image=init_image, generator=generator)
# let's checkout what's in image_state
print(f" image_state info")
print(image_state)
print(" ")
# let's checkout the sdxl_node info for img2img use case
print(f" sdxl_node info (img2img use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("image_latents"))
print(" ")
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
strength=strength,
guider_kwargs={"pag_scale": 3.0},
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test5_out_img2img.png")
print(f" save modular output to {out_folder}/test5_out_img2img.png")
clear_memory()
# test6: SDXL(img2img) with controlnet
# let's checkout the sdxl_node info for img2img controlnet use case
print(f" sdxl_node info (img2img controlnet use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("image_latents","control_image"))
print(" ")
print(f" ")
print(f" running test6: SDXL(img2img) with controlnet")
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
control_image=control_image,
guider_kwargs={"pag_scale": 3.0},
controlnet_conditioning_scale=controlnet_conditioning_scale,
strength=strength,
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test6_out_img2img_control.png")
print(f" save modular output to {out_folder}/test6_out_img2img_control.png")
clear_memory()
# test7: img2img with refiner
# let's checkout the refiner_node
print(f" refiner_node info")
print(refiner_node)
print(" ")
print(f" ")
print(f" running test7: img2img with refiner")
images_output = refiner_node(
image_latents=latents,
prompt=prompt,
denoising_start=0.8,
generator=generator,
output="images"
)
images_output.images[0].save(f"{out_folder}/test7_out_img2img_refiner.png")
print(f" save modular output to {out_folder}/test7_out_img2img_refiner.png")
clear_memory()
# test8: SDXL(inpainting)
# let's checkout the sdxl_node info for inpainting use case
print(f" sdxl_node info (inpainting use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("mask", "image_latents"))
print(" ")
print(f" ")
print(f" running test8: SDXL(inpainting)")
generator = torch.Generator(device="cuda").manual_seed(0)
image_state = image_node(image=inpaint_image, mask_image=inpaint_mask, height=1024, width=1024, generator=generator)
print(f" image_state info")
print(image_state)
print(" ")
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
guider_kwargs={"pag_scale": 3.0},
strength=inpaint_strength, # make sure to use `strength` below 1.0
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test8_out_inpainting.png")
print(f" save modular output to {out_folder}/test8_out_inpainting.png")
clear_memory()
# test9: SDXL(inpainting) with controlnet
# let's checkout the sdxl_node info for inpainting + controlnet use case
print(f" sdxl_node info (inpainting + controlnet use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("mask", "control_image"))
print(" ")
print(f" ")
print(f" running test9: SDXL(inpainting) with controlnet")
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
control_image=control_image,
guider_kwargs={"pag_scale": 3.0},
controlnet_conditioning_scale=controlnet_conditioning_scale,
strength=inpaint_strength, # make sure to use `strength` below 1.0
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test9_out_inpainting_control.png")
print(f" save modular output to {out_folder}/test9_out_inpainting_control.png")
clear_memory()
# test10: SDXL(inpainting) with inpaint_unet
print(f" ")
print(f" running test10: SDXL(inpainting) with inpaint_unet")
sdxl_node.update_states(unet=components.get("inpaint_unet"))
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
guider_kwargs={"pag_scale": 3.0},
strength=inpaint_strength, # make sure to use `strength` below 1.0
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test10_out_inpainting_inpaint_unet.png")
print(f" save modular output to {out_folder}/test10_out_inpainting_inpaint_unet.png")
clear_memory()
# test11: SDXL(inpainting) with inpaint_unet + padding_mask_crop
print(f" ")
print(f" running test11: SDXL(inpainting) with inpaint_unet (padding_mask_crop=33)")
generator = torch.Generator(device="cuda").manual_seed(0)
image_state = image_node(image=inpaint_image, mask_image=inpaint_mask, height=1024, width=1024, generator=generator, padding_mask_crop=33)
print(f" image_state info")
print(image_state)
print(" ")
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
guider_kwargs={"pag_scale": 3.0},
strength=inpaint_strength, # make sure to use `strength` below 1.0
output="latents"
)
# we need a different decoder when using padding_mask_crop
print(f" decoder_node info")
print(decoder_node)
print(" ")
print(f" decoder_node info (inpaint/padding_mask_crop)")
print(decoder_node.pipeline_block.blocks["inpaint"])
print(" ")
images_output = decoder_node(latents=latents, crops_coords=image_state.get_intermediate("crops_coords"), **image_state.inputs, output="images")
images_output.images[0].save(f"{out_folder}/test11_out_inpainting_inpaint_unet_padding_mask_crop.png")
print(f" save modular output to {out_folder}/test11_out_inpainting_inpaint_unet_padding_mask_crop.png")
clear_memory()
# test12: apg
print(f" ")
print(f" running test12: apg")
apg_guider = APGGuider()
sdxl_node.update_states(guider=apg_guider, unet=components.get("unet"))
generator = torch.Generator().manual_seed(0)
latents= sdxl_node(
**text_state.intermediates,
generator=generator,
num_inference_steps=20,
guidance_scale=15,
height=896,
width=768,
guider_kwargs={
"adaptive_projected_guidance_momentum": -0.5,
"adaptive_projected_guidance_rescale_factor": 15.0,
},
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test12_out_apg.png")
print(f" save modular output to {out_folder}/test12_out_apg.png")
clear_memory()
# test13: SDXL(text2img) with controlnet_union
sdxl_node.update_states(controlnet=components.get("controlnet_union"), unet=components.get("unet"), guider=pag_guider, controlnet_guider=controlnet_pag_guider)
image_node.update_states(vae=components.get("vae_fix"))
decoder_node.update_states(vae=components.get("vae_fix"))
# we are going to pass a new input now `control_mode` so the workflow will be automatically converted to controlnet use case
# let's checkout the info for controlnet use case
print(f" sdxl_node info (controlnet union use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("control_mode"))
print(" ")
print(f" ")
print(f" running test13: SDXL(text2img) with controlnet_union")
generator = torch.Generator(device="cuda").manual_seed(0)
latents = sdxl_node(
**text_state.intermediates,
control_mode=[3],
control_image=[controlnet_union_image],
height=1024,
width=1024,
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test13_out_text2img_control_union.png")
print(f" save modular output to {out_folder}/test13_out_text2img_control_union.png")
clear_memory()
# test14: SDXL(img2img) with controlnet_union
print(f" image_node info(with vae_fix for controlnet union)")
print(image_node)
print(" ")
print(f" ")
print(f" sdxl_node info (img2img controlnet union use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("image_latents", "control_mode"))
print(" ")
print(f" ")
print(f" running test14: SDXL(img2img) with controlnet_union")
generator = torch.Generator(device="cuda").manual_seed(0)
image_state = image_node(image=init_image, generator=generator)
print(f" image_state info")
print(image_state)
print(" ")
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
control_mode=[3],
control_image=[controlnet_union_image],
height=1024,
width=1024,
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test14_out_img2img_control_union.png")
print(f" save modular output to {out_folder}/test14_out_img2img_control_union.png")
clear_memory()
# test15: SDXL(inpainting) with controlnet_union
print(f" ")
print(f" sdxl_node info (inpainting controlnet union use case)")
print(sdxl_node.pipeline_block.get_triggered_blocks("mask", "control_mode"))
print(" ")
print(f" ")
print(f" running test15: SDXL(inpainting) with controlnet_union")
generator = torch.Generator(device="cuda").manual_seed(0)
image_state = image_node(image=inpaint_image, mask_image=inpaint_mask, height=1024, width=1024, generator=generator)
print(f" image_state info")
print(image_state)
print(" ")
latents = sdxl_node(
**text_state.intermediates,
**image_state.intermediates,
control_mode=[3],
control_image=[controlnet_union_image],
height=1024,
width=1024,
generator=generator,
output="latents"
)
images_output = decoder_node(latents=latents, output="images")
images_output.images[0].save(f"{out_folder}/test15_out_inpainting_control_union.png")
print(f" save modular output to {out_folder}/test15_out_inpainting_control_union.png")
clear_memory()
print_memory("the end")
print(f" components info after the end")
print(components) |
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_modular.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_modular.py
Outdated
Show resolved
Hide resolved
…ffusion_xl_modular.py Co-authored-by: Álvaro Somoza <[email protected]>
…into modular-diffusers
Getting Started with Modular Diffusers
With Modular Diffusers, we introduce a unified pipeline system that simplifies how you work with diffusion models. Instead of creating separate pipelines for each task, Modular Diffusers let you:
Write Only What's New: You won't need to rewrite the entire pipeline from scratch. You can create pipeline blocks just for your new workflow's unique aspects and reuse existing blocks for existing functionalities.
Assemble Like LEGO®: You can mix and match blocks in flexible ways. This allows you to write dedicated blocks for specific workflows, and then assemble different blocks into a pipeline that that can be used more conveniently for multiple workflows. Here we will walk you through how to use a pipeline like this we built with Modular diffusers! In later sections, we will also go over how to assemble and build new pipelines!
Quick Start with
StableDiffusionXLAutoPipeline
Auto Workflow Selection
The pipeline automatically adapts to your inputs:
prompt
image
inputimage
andmask_image
control_image
Auto Documentations
We care a great deal about documentation here at Diffusers, and Modular Diffusers carries this mission forward. All our pipeline blocks comes with complete docstrings that automatically compose as you build your pipelines. This means
inspect your pipeline
see an example of output
use
get_execution_blocks
to see which blocks will run for your inputs/workflow, for example, if you want to run a text-to-image controlnet workflow, you can do thissee the docstring relevant to your inputs/workflow
Advanced Workflows
Once you've created the auto pipeline, you can use it for different features as long as you add the required components and pass the required inputs.
Here is an example you can run for a more complex workflow using controlnet/IP-Adapter/Lora/PAG
check out more usage examples here
test1: complete testing script for `StableDiffusionXLAutoPipeline`
Modular Setup
StableDiffusionXLAutoPipeline
is a very convenient preset; Just like the LEGO sets, you can break it down and reassemble and rearrange the pipeline blocks however you want. A more modular setup would look like this:With this setup, you precompute embeddings and reuse them across different denoise backends or with different inference parameters such as
guidance_scale,
num_inference_steps,
or use different schedulers. You can modify your workflow by simply adding/removing/swapping blocks without recomputing the entire pipeline over and over again.check out the full example script here
test2: modular setup
This is the full testing script I used for more configuration, including inpainting/refiner/union controlnet/APGtest3: modular setup with IPAdapter
Developer Guide: Building with Modular Diffusers
Core Components Overview
The Modular Diffusers architecture consists of four main components:
ModularPipeline
The main interface for creating and running modular pipelines. Unlike traditional pipelines, you don't write it from scratch - it builds itself from pipeline blocks! Example usage:
PipelineBlock
The fundamental building block, similar to a mellon/comfy node. Each block:
__call__(pipeline, state) -> (pipeline, state)
MultiPipelineBlocks
Combines multiple blocks into a bigger one! These combined blocks behave just like single blocks - with their own inputs, outputs, and components, but they are able to handle more complex workflows!
We have two types of MultiPipelineBlocks available, you can use them to combine individual blocks into ready-to-use sets (Like LEGO® presets!)
SequentialPipelineBlocks
AutoPipelineBlocks
AutoPipelineBlocks
makes the complexif.. else..
logic in your code disappear! with this, you can write blocks for specific use case to keep your code path clean; and useAutoPipelineBlocks
to combine blocks into convenient presets that can provide a better user experience :)ControlNetDenoiseStep
step will be dispatched when "control_image" is passed from the user, otherwise, it will run the defaultDenoseStep
PipelineState and BlockStates
PipelineState
andBlockStates
manage dataflow between/inside blocks; they make debugging really easy! feel free to print out them at any given time to have an overview of all the shapes/types/values of your pipeline/block statesDifferential Diffusion Example
Here we'll show you a new way to build with Modular Diffusers. Let's look at implementing a Differential Diffusion pipeline as an example. (https://differential-diffusion.github.io/). It is, in a sense, an image-to-image workflow, so we can start with the preset of pipeline blocks we used to build our current img2img pipeline (
IMAGE2IMAGE_BLOCKS
) and see how we can build this new pipeline with them!It seems like we can reuse the
"text_encoder"
,"ip_adapter"
,"image_encoder"
,"input"
,"prepare_add_cond"
and"decode"
steps from img2img workflow out-of-box. The"set_timesteps"
step in Differential Diffusion is the same as the one we use for text-to-image (i.e. it does not takestrength
parameter), so we just useStableDiffusionXLSetTimestepsStep
. It uses a different denoising method so we will need to write a new"denoise"
step, and the"prepare_latents"
step is also a little bit different, so we will write a new one too.Here are the changes needed to create the Differential Diffusion version of these blocks:
StableDiffusionXLImg2ImgPrepareLatentsStep
:StableDiffusionXLDenoiseStep
step: we remove inpaint-related logics and added diff-diff specific logicThat's all there is to it! Once you've made these 2 diff-diff blocks, you can create a preset(pre-assembled sets of blocks) and then build your pipeline from it.
to use it
Complete Example: Implementing Differential Diffusion Pipeline
Diffusers as seen in nodes
coming up soon....
Next Steps