Confusion on FlexiViT #94

zilunzhang · 2024-03-11T06:42:12Z

Hi, thanks for bringing us such great work! I have two questions regarding the paper.

The PI-resize method does not introduce any learnable parameter, it should be compatible with any ViT model. Therefore, we can use the PI-resize in a zero-shot manner? Then, what's the point of training the FlexiViT? I know since the patch size can be (almost) any number with PI-resize, we can transfer the knowledge of ViT-8 through distillation. But is there any difference between training a FlexiViT and using PI-resize directly in the ViT-8 model (without training)? In Figure 3, the authors mentioned that "Standard ViTs (ViT-16/ViT-30) are not flexible", but the authors "simply resize the patch embedding weights ω and the position embeddings π with bilinear interpolation", not PI.
Will the weight of FlexiCLIP be released someday?

Thanks, I am really looking forward to the answers!

Best,

Zilun

zilunzhang changed the title ~~Confuse on FlexiViT~~ Confusion on FlexiViT Mar 11, 2024

Provide feedback