Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement framewise encoding/decoding in LTX Video VAE #10333

Open
a-r-r-o-w opened this issue Dec 21, 2024 · 2 comments · May be fixed by #10488
Open

Implement framewise encoding/decoding in LTX Video VAE #10333

a-r-r-o-w opened this issue Dec 21, 2024 · 2 comments · May be fixed by #10488
Labels

Comments

@a-r-r-o-w
Copy link
Member

Currently, we do not implement framewise encoding/decoding in the LTX Video VAE. This leads to an opportunity for reducing memory usage, which will be beneficial for both inference and training.

LoRA finetuning LTX Video on 49x512x768 videos can be done in under 6 GB if prompts and latents are pre-computed, but the pre-computation requires about 12 GB of memory because of the VAE encode/decode. This can be reduced by a considerable amount and lower the bar for entry into video model finetuning. Our friends with potatoes need you!

As always, contributions are welcome 🤗 Happy new year!

@rootonchair
Copy link
Contributor

Hi @a-r-r-o-w, this is interesting and I would like to take it

@rootonchair
Copy link
Contributor

Here it the result of my first attempt, If you notice there are still some inconsistency. I will try enhancing it if possible

No framewise decoding:

ltx_org_output.mp4

Framewise decoding:

ltx_output.mp4

@rootonchair rootonchair linked a pull request Jan 7, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants