Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QKV fusion to the Hunyuan Video transformer #10407

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Ednaordinary
Copy link

What does this PR do?

This adds QKV fusion to Hunyuan Video. At the moment, this gives minimal/no improvement:

QKV No QKV
Time (sec) 522.18 547.21
VRAM (GiB) 4.17 3.88

The biggest improvement is expected in combination with torchao, though that currently errors out due to torchao tensors not having the ability to be concatenated:

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.cat', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'list'>,), kwarg_types={}

BitsAndBytes also errors out (relevant but somewhat dated discussion):

RuntimeError: Only Tensors of floating point and complex dtype can require gradients

There's a slight hack in HunyuanVideoIndividualTokenRefinerBlock, since attention with qkv fusion seems to become tuple(tensor, None) instead of just a tensor

Reproducible script

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
import imageio as iio
import math
import numpy as np
import io
import time

quant_div = 3
quant_mod = 3
full_dtype = torch.bfloat16
cast_dtype = torch.float8_e4m3fn

torch.manual_seed(42)

def export_to_video_bytes(fps, frames):
    request = iio.core.Request("<bytes>", mode="w", extension=".mp4")
    pyavobject = iio.plugins.pyav.PyAVPlugin(request)
    if isinstance(frames, np.ndarray):
        frames = (np.array(frames) * 255).astype('uint8')
    else:
        frames = np.array(frames)
    new_bytes = pyavobject.write(frames, codec="libx264", fps=fps)
    out_bytes = io.BytesIO(new_bytes)
    return out_bytes

def export_to_video(frames, path, fps):
    video_bytes = export_to_video_bytes(fps, frames)
    video_bytes.seek(0)
    with open(path, "wb") as f:
        f.write(video_bytes.getbuffer())

model_id = "tencent/HunyuanVideo"

print("Loading transformer")
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16, revision="refs/pr/18"
)
transformer.fuse_qkv_projections()

pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16, revision="refs/pr/18")
pipe.scheduler._shift = 7.0
pipe.vae.enable_tiling()
#pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

start_time = time.perf_counter()
output = pipe(
    prompt="a cat walks along the sidewalk of a city. The camera follows the cat at knee level. The city has many people and cars moving around, with advertisement billboards in the background",
    #height=544, #544,
    #width=960, #960,
    height = 544,
    width=960,
    num_frames=45,
    num_inference_steps=20,
).frames[0]
export_to_video(output, "output.mp4", fps=15)
print("Time:", round(time.perf_counter() - start_time, 2), "seconds")
print("Max vram:", round(torch.cuda.max_memory_allocated(device="cuda") / 1024 ** 3, 3), "GiB")

Comparison

QKV fusion:

output_qkv.mp4

No fusion:

output.mp4

Results are different but comparable.

Before submitting

Who can review?

@a-r-r-o-w @DN6

@a-r-r-o-w a-r-r-o-w self-requested a review December 30, 2024 11:38
Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, QKV fusion does not really help much with eiter time or memory requirements, even with quantization. In fact, there is even slow downs at times depending on the quantization technique applied.

Not sure if it would be beneficial adding but since we do support it for some other things, it makes sense to do so in the interest of consistency. Will ask @yiyixuxu to make the final call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants