Train text-to-image 257x768 diffusion prior to use as pretrained starting point #19

PaulScotti · 2023-05-31T18:44:09Z

For MindEye we mapped to CLIP ViT-L/32 final layer (shape 1x768) as well as to CLIP ViT-L/32 last hidden layer (shape 257x768). For the former we found that using a pretrained starting point for fine-tuning the diffusion prior really benefitted performance--we used this pretrained checkpoint text-to-image diffusion prior trained from LAION-Aesthetics. See this github repo for more info: https://github.com/lucidrains/DALLE2-pytorch/tree/main

We would have liked to likewise use a pretrained starting point for training our 257x768 diffusion prior, but no pretrained checkpoint like that exists! If someone trains a 257x768 checkpoint we can use as a starting point, this could really improve MindEye reconstructions!

To tackle this issue you might benefit from joining the #dalle2-prior channel in the LAION Discord server.

AvancierGuo · 2024-03-12T07:32:55Z

So I wonder the intension you choose to CLIP everybody into shape 257x768 rather than shape 1x768 at very beggining,What's the special thing the hidden layer can express?

AvancierGuo · 2024-03-12T07:35:56Z

In my view,I think that the hidden layer dosen't encode completely to construct an embedding to represent the image.

PaulScotti added enhancement New feature or request help wanted Extra attention is needed hard-difficulty labels May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train text-to-image 257x768 diffusion prior to use as pretrained starting point #19

Train text-to-image 257x768 diffusion prior to use as pretrained starting point #19

PaulScotti commented May 31, 2023 •

edited

Loading

AvancierGuo commented Mar 12, 2024

AvancierGuo commented Mar 12, 2024

Train text-to-image 257x768 diffusion prior to use as pretrained starting point #19

Train text-to-image 257x768 diffusion prior to use as pretrained starting point #19

Comments

PaulScotti commented May 31, 2023 • edited Loading

AvancierGuo commented Mar 12, 2024

AvancierGuo commented Mar 12, 2024

PaulScotti commented May 31, 2023 •

edited

Loading