Request for more description on the First Frame Editing (I2V) use case #13

NitishMamadgi · 2025-01-27T07:32:49Z

Very curious to know how exactly you would create warped noise for First Frame Editing (I2V) case. What is the input video to be given to the noise warping algorithm? I am guessing a input video with just the first frame edit wouldn't do the job since you would want the inserted object in all the frames. Just pasting the object in all the frames at the same location doesn't match with the original video's camera motion.

So, what's the modified input original video which should be used to create the warping noise and how to get it?

Thanks.
@RyannDaGreat

Xavitek · 2025-01-28T16:49:29Z

I'm also very interested in this feature. Will it be released on GitHub?

I assume you provide it with the input video and the edited image of the first frame. Then, that first frame is used to generate the rest of the video, following the motion of the original video.

RyannDaGreat · 2025-01-29T07:39:28Z

Hi! This is pretty simple actually - it's just a matter creating another tutorial, all the tools are here for that already. I put TODO for a dedicated code-release for just first-frame editing, but please realize it's almost identical to the current tutorial - as that also has I2V motion transfer going on under the hood! Until I do that, here's a description of what has to be done.

This is modified from "2. Running Video Diffusion (GPU)" on the readme, please follow the installation instructions and do this on a linux machine with NVIDIA GPU's with at 24GB VRAM (maybe you can do with less, maybe 12GB will work - but 24GB should be safe).
Here's what you have to do:
First, get the warped noise you need: python make_warped_noise.py <PATH TO VIDEO OR URL> --output_folder noise_warp_output_folder
That will make things inside a noise_warp_output_folder.
Then you need to run diffusion inference using that warped noise plus the modified input image

python cut_and_drag_inference.py noise_warp_output_folder \
    --prompt "A duck splashing" \
    --image "/path/to/modified/first/frame/image.png" \
    --output_mp4_path "output.mp4" \
    --device "cuda" \
    --num_inference_steps 30

Please try this out! This should work but if I need to revise the instructions please tell me here and I'll help debug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for more description on the First Frame Editing (I2V) use case #13

Request for more description on the First Frame Editing (I2V) use case #13

NitishMamadgi commented Jan 27, 2025

Xavitek commented Jan 28, 2025

RyannDaGreat commented Jan 29, 2025

Request for more description on the First Frame Editing (I2V) use case #13

Request for more description on the First Frame Editing (I2V) use case #13

Comments

NitishMamadgi commented Jan 27, 2025

Xavitek commented Jan 28, 2025

RyannDaGreat commented Jan 29, 2025