Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux - soft inpainting via differential diffusion #9268

Merged
merged 8 commits into from
Oct 14, 2024

Conversation

ryanlyn
Copy link
Contributor

@ryanlyn ryanlyn commented Aug 25, 2024

What does this PR do?

Adds a new community pipeline that brings Differential Diffusion to the Flux.1 family of models (currently Flux.1-schnell and Flux.1-dev).

Builds right on top of the fantastic work of #9135. My additions pertain only to the various diff diff annotations.

Things to do:

  • implementation
  • documentation

Testing

Flux.1-schnell

The schnell model can be used following this example:

image = preprocess_image(load_image(
        "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true"
    ))

mask = preprocess_map(load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true"
))

pipe = FluxDifferentialImg2ImgPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell", 
    torch_dtype=torch.bfloat16, 
    device_map="balanced",
)

out = pipe(
    prompt="a red strawberry, black background",
    guidance_scale=0.0, 
    num_inference_steps=12,
    image=image,
    mask_image=mask,
    strength=0.88,
).images[0]

out.show()

A red strawberry, black background (12 steps at 0.9 strength):
image
Blending with the schnell model is hard to get right

Flux.1-dev

I expect the Dev model to be used via this if there is sufficient vram:

image = load_image(
    "https://github.com/exx8/differential-diffusion/blob/main/assets/input.jpg?raw=true",
)

mask = load_image(
    "https://github.com/exx8/differential-diffusion/blob/main/assets/map.jpg?raw=true",
)

pipe = FluxDifferentialImg2ImgPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

out = pipe(
    prompt="...",
    num_inference_steps=20,
    guidance_scale=7.5,
    image=image,
    mask_image=mask,
    strength=1.0,
).images[0]

out.show()

My tests/usages, however, were all done on the FP8 quantized version (https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors):

A green pear, black background (50 steps at 1.0 strength):
image

painting of a mountain landscape with a meadow and a forest, meadow background, anime countryside landscape, anime nature wallpap, anime landscape wallpaper, studio ghibli landscape, anime landscape, mountain behind meadow, anime background art, studio ghibli environment, background of flowery hill, anime beautiful peace scene, forrest background, anime scenery, landscape background, background art, anime scenery concept art (20 steps at 1.0 strength):
image

Before submitting

  • Did you read the contributor guideline?
  • Did you read our philosophy doc (important for complex PRs)?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.

Who can review?

@Skquark
Copy link

Skquark commented Sep 11, 2024

Question, would this work as an alternative to FluxInpaint to be able to do outpainting with the inner mask blurred, or do I wait for FluxDifferentialInpaintPipeline? I've been trying to add Flux in my Infinite Zoom implementation, and while I got it working, it's just not blending the masked area well and each outstep is framed. Works nicely with SD 1.5 Inpainting. I've tried to blur the mask_image black/white and didn't help. I also notice with this Differential mask_image that the masked area is black instead of white as it is in the normal Inpainting, correct? Can be tested on my app at DiffusionDeluxe.com if you're curious.

@yiyixuxu yiyixuxu requested a review from asomoza September 12, 2024 02:29
@yiyixuxu
Copy link
Collaborator

@asomoza can you take a look and help merge this?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza
Copy link
Member

asomoza commented Sep 16, 2024

HI @ryanlyn, sorry for this late review, did you update the pipeline with the latest changes from the img2img pipeline? This work ok and we don't really enforce too much of the guidelines here, but it will be nice if the copied lines are the same from the img2img.

Let's merge this soon!

@jffu
Copy link

jffu commented Oct 11, 2024

Thanks for great work! Looks like It fails when batch_size > 1?

Traceback (most recent call last):
  File "test_diff_diff_flux.py", line 41, in <module>
    image = pipeline(
  File "/home/admin/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/admin/workspace/aop_lab/app_source/pipeline_flux_differential_img2img.py", line 855, in __call__
    latents, noise, original_image_latents, latent_image_ids = self.prepare_latents(
  File "/home/admin/workspace/aop_lab/app_source/pipeline_flux_differential_img2img.py", line 566, in prepare_latents
    image_latents = self._pack_latents(image_latents, batch_size, num_channels_latents, height, width)
  File "/home/admin/workspace/aop_lab/app_source/pipeline_flux_differential_img2img.py", line 504, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
RuntimeError: shape '[2, 16, 64, 2, 64, 2]' is invalid for input of size 262144

@asomoza asomoza mentioned this pull request Oct 11, 2024
@asomoza
Copy link
Member

asomoza commented Oct 11, 2024

giving a gentle ping to @ryanlyn so we can merge this pipeline. @jffu that error it's probably because it doesn't have the latest updates to the base pipeline unless that also happens to the base img2img pipeline.

@jffu
Copy link

jffu commented Oct 12, 2024

giving a gentle ping to @ryanlyn so we can merge this pipeline. @jffu that error it's probably because it doesn't have the latest updates to the base pipeline unless that also happens to the base img2img pipeline.

@asomoza yes, merging the latest updates can fix this.
This is the modified section, just in case someone else needs it.

@@ -554,6 +554,17 @@ class FluxDifferentialImg2ImgPipeline(DiffusionPipeline, FluxLoraLoaderMixin):
         else:
             image_latents = latents
 
+        if batch_size > image_latents.shape[0] and batch_size % image_latents.shape[0] == 0:
+            # expand init_latents for batch_size
+            additional_image_per_prompt = batch_size // image_latents.shape[0]
+            image_latents = torch.cat([image_latents] * additional_image_per_prompt, dim=0)
+        elif batch_size > image_latents.shape[0] and batch_size % image_latents.shape[0] != 0:
+            raise ValueError(
+                f"Cannot duplicate `image` of batch size {image_latents.shape[0]} to {batch_size} text prompts."
+            )
+        else:
+            image_latents = torch.cat([image_latents], dim=0)
+
         noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
         latents = noise if is_strength_max else self.scheduler.scale_noise(image_latents, timestep, noise)
         noise = self._pack_latents(noise, batch_size, num_channels_latents, height, width)
@@ -882,7 +893,7 @@ class FluxDifferentialImg2ImgPipeline(DiffusionPipeline, FluxLoraLoaderMixin):
         mask_thresholds = mask_thresholds.unsqueeze(1).unsqueeze(1).to(device)
         masks = (original_mask > mask_thresholds)
         masks = self._pack_latents(
-            masks.repeat(num_channels_latents, 1, 1, 1).permute(1, 0, 2, 3),
+            masks.repeat(num_channels_latents // num_images_per_prompt, 1, 1, 1).permute(1, 0, 2, 3),
             len(mask_thresholds),
             num_channels_latents,
             2 * (int(height) // self.vae_scale_factor),

@ryanlyn
Copy link
Contributor Author

ryanlyn commented Oct 13, 2024

Sorry about the wait @asomoza 🙏 and thank you for finding that issue @jffu , I've added in the latest changes from the base pipeline and it addressed batching and simplified the mask arrangement

@asomoza
Copy link
Member

asomoza commented Oct 14, 2024

Thanks a lot, also as a reference, this seems to be a really good alternative for Flux inpainting, haven't tested the controlnet yet, but the quality of diff-diff seems decent.

original result result
image 20241014144542_2304139227 20241014150525_3174216120

@asomoza asomoza merged commit 68d16f7 into huggingface:main Oct 14, 2024
8 checks passed
@Clement-Lelievre
Copy link
Contributor

Clement-Lelievre commented Oct 16, 2024

thanks for this work!

What are the use cases for using this pipe with a strength param other than 1?
Correct me if I'm wrong, but I feel doing so would somewhat go against the inference mechanism of this pipeline.

For example, a fully dark pixel in the change map would no longer be totally overriden in the output image's corresponding pixel.

@asomoza
Copy link
Member

asomoza commented Oct 16, 2024

Probably @exx8 can give you a more detailed answer, but for me, it's the same as when you use img2img or inpainting, if you use a strength of 1.0 you are ignoring whatever was before with the difference that the soft mask attenuates the difference between the masked part and the not masked part if you use a gradient.

If you use a lower strength, the generation tries to adapt more to what was before and also diff-diff makes it that it merges better with the old part. This is IMO what makes great diff-diff, you can use it as inpainting or just to gradually change some parts of the image like in the demo or like I did here .

In the same post you can also see what I did with the crow example, using 0.8 works great in that image because I didn't want to completely override what was before, so it took part of the shape of the previous bird.

Apart from the strength you can also play with the brightness of the mask, IMO people don't really understand the versatility of what you can do with diff-diff, partially because the UIs don't have the kind of tools to work with masks and gradients.

sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* Flux - soft inpainting via differential diffusion

* .

* track changes to FluxInpaintPipeline

* make mask arrangement simplier

* make style

---------

Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: Álvaro Somoza <[email protected]>
Co-authored-by: asomoza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants