Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train text-to-image 257x768 diffusion prior to use as pretrained starting point #19

Open
PaulScotti opened this issue May 31, 2023 · 2 comments
Labels
enhancement New feature or request hard-difficulty help wanted Extra attention is needed

Comments

@PaulScotti
Copy link
Collaborator

PaulScotti commented May 31, 2023

For MindEye we mapped to CLIP ViT-L/32 final layer (shape 1x768) as well as to CLIP ViT-L/32 last hidden layer (shape 257x768). For the former we found that using a pretrained starting point for fine-tuning the diffusion prior really benefitted performance--we used this pretrained checkpoint text-to-image diffusion prior trained from LAION-Aesthetics. See this github repo for more info: https://github.com/lucidrains/DALLE2-pytorch/tree/main

We would have liked to likewise use a pretrained starting point for training our 257x768 diffusion prior, but no pretrained checkpoint like that exists! If someone trains a 257x768 checkpoint we can use as a starting point, this could really improve MindEye reconstructions!

To tackle this issue you might benefit from joining the #dalle2-prior channel in the LAION Discord server.

@PaulScotti PaulScotti added enhancement New feature or request help wanted Extra attention is needed hard-difficulty labels May 31, 2023
@AvancierGuo
Copy link

So I wonder the intension you choose to CLIP everybody into shape 257x768 rather than shape 1x768 at very beggining,What's the special thing the hidden layer can express?

@AvancierGuo
Copy link

In my view,I think that the hidden layer dosen't encode completely to construct an embedding to represent the image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hard-difficulty help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants