Skip to content

Latest commit

 

History

History
67 lines (49 loc) · 3.53 KB

README.md

File metadata and controls

67 lines (49 loc) · 3.53 KB

🧹CleanDIFT: Diffusion Features without Noise

CompVis Group @ LMU Munich
* Equal Contribution

Project Page Paper Weights

This repository contains the official implementation of the paper "CleanDIFT: Diffusion Features without Noise".

We propose CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. Our approach is efficient, training on a single GPU in just 30 minutes.

teaser

🚀 Usage

Setup

Just clone the repo and install the requirements via pip install -r requirements.txt, then you're ready to go.

Training

In order to train a feature extractor on your own, you can run python train.py. The training script expects your data to be stored in ./data with the following format: Single level directory with images named filename.jpg and corresponding json files filename.json that contain the key caption.

Feature Extraction

For feature extraction, please refer to one of the notebooks at notebooks. We demonstrate how to extract features and use them for semantic correspondence detection and depth prediction.

Our checkpoints are fully compatible with the diffusers library. If you already have a pipeline using SD 1.5 or SD 2.1 from diffusers, you can simply replace the U-Net state dict:

from diffusers import UNet2DConditionModel
from huggingface_hub import hf_hub_download

unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet")
ckpt_pth = hf_hub_download(repo_id="CompVis/cleandift", filename="cleandift_sd21_unet.safetensors")
state_dict = load_file(ckpt_pth)
unet.load_state_dict(state_dict, strict=True)

Combination with Telling Left From Right (TLFR)

Our checkpoints can also be combined with more advanced feature extraction methods than DIFT, such as TLFR. We provide an adapted version of their codebase that can be used to reproduce our state-of-the-art zero-shot semantic correspondence results in the geoaware-sc-eval branch.

🎓 Citation

If you use this codebase or otherwise found our work valuable, please cite our paper:

@misc{stracke2024cleandiftdiffusionfeaturesnoise,
      title={CleanDIFT: Diffusion Features without Noise}, 
      author={Nick Stracke and Stefan Andreas Baumann and Kolja Bauer and Frank Fundel and Björn Ommer},
      year={2024},
      eprint={2412.03439},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.03439}, 
}