Skip to content

Commit

Permalink
[pipeline] CogVideoX-Fun Control (#9671)
Browse files Browse the repository at this point in the history
* cogvideox-fun control

* make style

* make fix-copies

* karras schedulers

* Update src/diffusers/pipelines/cogvideo/pipeline_cogvideox_fun_control.py

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/api/pipelines/cogvideox.md

Co-authored-by: Steven Liu <[email protected]>

* apply suggestions from review

---------

Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
  • Loading branch information
3 people committed Dec 23, 2024
1 parent 6e02cd7 commit 5f88292
Show file tree
Hide file tree
Showing 7 changed files with 1,154 additions and 1 deletion.
10 changes: 10 additions & 0 deletions docs/source/en/api/pipelines/cogvideox.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ There are two models available that can be used with the text-to-video and video
There is one model available that can be used with the image-to-video CogVideoX pipeline:
- [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V): The recommended dtype for running this model is `bf16`.

There are two models that support pose controllable generation (by the [Alibaba-PAI](https://huggingface.co/alibaba-pai) team):
- [`alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose): The recommended dtype for running this model is `bf16`.
- [`alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose): The recommended dtype for running this model is `bf16`.

## Inference

Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
Expand Down Expand Up @@ -118,6 +122,12 @@ It is also worth noting that torchao quantization is fully compatible with [torc
- all
- __call__

## CogVideoXFunControlPipeline

[[autodoc]] CogVideoXFunControlPipeline
- all
- __call__

## CogVideoXPipelineOutput

[[autodoc]] pipelines.cogvideo.pipeline_output.CogVideoXPipelineOutput
2 changes: 2 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@
"BlipDiffusionControlNetPipeline",
"BlipDiffusionPipeline",
"CLIPImageProjection",
"CogVideoXFunControlPipeline",
"CogVideoXImageToVideoPipeline",
"CogVideoXPipeline",
"CogVideoXVideoToVideoPipeline",
Expand Down Expand Up @@ -711,6 +712,7 @@
AudioLDMPipeline,
AuraFlowPipeline,
CLIPImageProjection,
CogVideoXFunControlPipeline,
CogVideoXImageToVideoPipeline,
CogVideoXPipeline,
CogVideoXVideoToVideoPipeline,
Expand Down
8 changes: 7 additions & 1 deletion src/diffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@
"CogVideoXPipeline",
"CogVideoXImageToVideoPipeline",
"CogVideoXVideoToVideoPipeline",
"CogVideoXFunControlPipeline",
]
_import_structure["cogview3"] = ["CogView3PlusPipeline"]
_import_structure["controlnet"].extend(
Expand Down Expand Up @@ -470,7 +471,12 @@
)
from .aura_flow import AuraFlowPipeline
from .blip_diffusion import BlipDiffusionPipeline
from .cogvideo import CogVideoXImageToVideoPipeline, CogVideoXPipeline, CogVideoXVideoToVideoPipeline
from .cogvideo import (
CogVideoXFunControlPipeline,
CogVideoXImageToVideoPipeline,
CogVideoXPipeline,
CogVideoXVideoToVideoPipeline,
)
from .cogview3 import CogView3PlusPipeline
from .controlnet import (
BlipDiffusionControlNetPipeline,
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/pipelines/cogvideo/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
else:
_import_structure["pipeline_cogvideox"] = ["CogVideoXPipeline"]
_import_structure["pipeline_cogvideox_fun_control"] = ["CogVideoXFunControlPipeline"]
_import_structure["pipeline_cogvideox_image2video"] = ["CogVideoXImageToVideoPipeline"]
_import_structure["pipeline_cogvideox_video2video"] = ["CogVideoXVideoToVideoPipeline"]

Expand All @@ -35,6 +36,7 @@
from ...utils.dummy_torch_and_transformers_objects import *
else:
from .pipeline_cogvideox import CogVideoXPipeline
from .pipeline_cogvideox_fun_control import CogVideoXFunControlPipeline
from .pipeline_cogvideox_image2video import CogVideoXImageToVideoPipeline
from .pipeline_cogvideox_video2video import CogVideoXVideoToVideoPipeline

Expand Down
Loading

0 comments on commit 5f88292

Please sign in to comment.