ViViD: Video Virtual Try-on using Diffusion Models
Dataset released: ViViD
git clone https://github.com/alibaba-yuanjing-aigclab/ViViD
cd ViViD
conda create -n vivid python=3.10
conda activate vivid
pip install -r requirements.txt
You can place the weights anywhere you like, for example, ./ckpts
. If you put them somewhere else, you just need to update the path in ./configs/prompts/*.yaml
.
cd ckpts
git lfs install
git clone https://huggingface.co/lambdalabs/sd-image-variations-diffusers
git lfs install
git clone https://huggingface.co/stabilityai/sd-vae-ft-mse
Download mm_sd_v15_v2
git lfs install
git clone https://huggingface.co/alibaba-yuanjing-aigclab/ViViD
We provide two demos in ./configs/prompts/
, run the following commands to have a try😼.
python vivid.py --config ./configs/prompts/upper1.yaml
python vivid.py --config ./configs/prompts/lower1.yaml
As illustrated in ./data
, the following data should be provided.
./data/
|-- agnostic
| |-- video1.mp4
| |-- video2.mp4
| ...
|-- agnostic_mask
| |-- video1.mp4
| |-- video2.mp4
| ...
|-- cloth
| |-- cloth1.jpg
| |-- cloth2.jpg
| ...
|-- cloth_mask
| |-- cloth1.jpg
| |-- cloth2.jpg
| ...
|-- densepose
| |-- video1.mp4
| |-- video2.mp4
| ...
|-- videos
| |-- video1.mp4
| |-- video2.mp4
| ...
This part is a bit complex, you can obtain them through any of the following three ways:
- Follow OOTDiffusion to extract them frame-by-frame.(recommended)
- Use SAM + Gaussian Blur.(see
./tools/sam_agnostic.py
for an example) - Mask editor tools.
Note that the shape and size of the agnostic area may affect the try-on results.
See vid2densepose.(Thanks)
Any detection tool is ok for obtaining the mask, like SAM.
@misc{fang2024vivid,
title={ViViD: Video Virtual Try-on using Diffusion Models},
author={Zixun Fang and Wei Zhai and Aimin Su and Hongliang Song and Kai Zhu and Mao Wang and Yu Chen and Zhiheng Liu and Yang Cao and Zheng-Jun Zha},
year={2024},
eprint={2405.11794},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Zixun Fang: [email protected]
Yu Chen: [email protected]