This is the official repository for the following paper:
Towards Squeezing-Averse Virtual Try-On via Sequential Deformation [arxiv]
Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo
Accepted by AAAI 2024.
This repository is currently built only for sharing the source code of an academic research paper.
It has several limitations. Please check out them at below.
- 2024-01-31 We have released the source codes and checkpoints.
Clone this repository:
git clone https://github.com/SHShim0513/SD-VITON.git
cd ./SD-VITON/
Install PyTorch and other dependencies:
conda create -n {env_name} python=3.8
conda activate {env_name}
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia
pip install opencv-python torchgeometry Pillow tqdm tensorboardX scikit-image scipy timm==0.4.12
We train and evaluate our model using the dataset from the following link.
We assume that you have downloaded it into ./data
.
Here are the download links for each model checkpoint:
Dataset | Network Type | Output Resolution | Google Cloud |
---|---|---|---|
VITON-HD | Try-on condition generator | Appearance flows with 128 x 96 | Download |
VITON-HD | Try-on image generator | Images with 1024 x 768 | Download |
- AlexNet (LPIPS): link, we assume that you have downloaded it into
./eval_models/weights/v0.1
.
python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --composition_mask
python3 train_condition.py --gpu_ids {gpu_ids} --Ddownx2 --Ddropout --interflowloss --occlusion --tvlambda_tvob 2.0 --tvlambda_taco 2.0
python3 train_generator.py --name test -b 4 -j 8 --gpu_ids {gpu_ids} --fp16 --tocg_checkpoint {condition generator ckpt path} --occlusion --composition_mask
This stage takes approximately 4 days with two A6000 GPUs.
To use "--fp16" option, you should install apex library.
Our work still has several limitations that are not an unique problem of ours in our best knowledge.
Several samples have sufferred from a crack artifact.
In our best knowledge, the crack is amplified due to the up-sizing of last appearance flows (AFs).
E.g., our network infers the last AFs with 128 x 96 resolution, and then up-scales to 1024 x 768.
Thereby, the crack regions are extended.
A slightly reduceable way will be to infer the last AFs with more closer to an image resolution (see "After").
We provide a checkpoint, where networks infer the AFs with 256 x 192 and an image with 512 x 384 resolution.
Dataset | Network Type | Output Resolution | Google Cloud |
---|---|---|---|
VITON-HD | Try-on condition generator | Appearance flows with 256 x 192 | Download |
VITON-HD | Try-on image generator | Images with 512 x 384 | Download |
The corresponding script for inference is as follows:
python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --fine_width 384 --fine_height 512 --num_upsampling_layers more --cond_G_ngf 48 --cond_G_input_width 384 --cond_G_input_height 512 --cond_G_num_layers 6
Same as other methods, our network cannot fully remove the clothes textures behind the neck.
Thereby, it remains in the generated samples.
A solution would be to mask out such regions when pre-processing the inputs.
We did not apply such additional technique, since it was not included in a dataset.
This repository is built based on HR-VITON repository. Thanks for the great work.
If you find this work useful for your research, please cite our paper:
@article{shim2023towards,
title={Towards Squeezing-Averse Virtual Try-On via Sequential Deformation},
author={Shim, Sang-Heon and Chung, Jiwoo and Heo, Jae-Pil},
journal={arXiv preprint arXiv:2312.15861},
year={2023}
}