Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for more description on the First Frame Editing (I2V) use case #13

Open
NitishMamadgi opened this issue Jan 27, 2025 · 2 comments

Comments

@NitishMamadgi
Copy link

Very curious to know how exactly you would create warped noise for First Frame Editing (I2V) case. What is the input video to be given to the noise warping algorithm? I am guessing a input video with just the first frame edit wouldn't do the job since you would want the inserted object in all the frames. Just pasting the object in all the frames at the same location doesn't match with the original video's camera motion.

So, what's the modified input original video which should be used to create the warping noise and how to get it?

Thanks.
@RyannDaGreat

@Xavitek
Copy link

Xavitek commented Jan 28, 2025

I'm also very interested in this feature. Will it be released on GitHub?

I assume you provide it with the input video and the edited image of the first frame. Then, that first frame is used to generate the rest of the video, following the motion of the original video.

@RyannDaGreat
Copy link
Collaborator

Hi! This is pretty simple actually - it's just a matter creating another tutorial, all the tools are here for that already. I put TODO for a dedicated code-release for just first-frame editing, but please realize it's almost identical to the current tutorial - as that also has I2V motion transfer going on under the hood! Until I do that, here's a description of what has to be done.

This is modified from "2. Running Video Diffusion (GPU)" on the readme, please follow the installation instructions and do this on a linux machine with NVIDIA GPU's with at 24GB VRAM (maybe you can do with less, maybe 12GB will work - but 24GB should be safe).
Here's what you have to do:
First, get the warped noise you need: python make_warped_noise.py <PATH TO VIDEO OR URL> --output_folder noise_warp_output_folder
That will make things inside a noise_warp_output_folder.
Then you need to run diffusion inference using that warped noise plus the modified input image

python cut_and_drag_inference.py noise_warp_output_folder \
    --prompt "A duck splashing" \
    --image "/path/to/modified/first/frame/image.png" \
    --output_mp4_path "output.mp4" \
    --device "cuda" \
    --num_inference_steps 30

Please try this out! This should work but if I need to revise the instructions please tell me here and I'll help debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants