Skip to content

hrodruck/effnet_clip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Welcome!

Main idea:

  • start with a dataset of square images
  • resize them all to 2 copies: one with about 768x768 px and another with 224x224px
  • run effnet (the specific one from stable cascade) on the first set of copies to generate 16x24x24 latents
  • run CLIP (original from openai) on the second set of copies to generate 1x512 latents
  • train an MLP on the effnet latent such that the output from the MLP matches the CLIP latents
  • then, at inference time, you can have an MLP that translates from 16x24x24 effnet latents to 1x512 clip latents, without having to go through pixel or the image modality to perform semantic comparisons!

Download jupyter notebook to see some results! If it's only to view it, you can just upload the file to google colab

Q/D Setup instructions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published