🎉The code repository for "Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification" in PyTorch. If you use any content of this repo for your work, please cite the following bib entry:
@inproceedings{yic2024coder,
title={Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification},
author={Yi, Chao and Ren, Lu and Zhan, De-Chuan and Ye, Han-Jia},
booktitle={CVPR},
year={2024}
}
We use the five CLIP models provided officially by OpenAI, namely CLIP RN50, CLIP ViT-B/32, CLIP ViT-B/16, CLIP ViT-L/14 and CLIP ViT-L/14@336px. The download links for these models are CLIP RN50, CLIP ViT-B/32, CLIP ViT-B/16, CLIP ViT-L/14, ViT-L/14@336px.
The downloaded pre-trained models should be placed in the ./CLIP/models/ckp/clip
.
Please refer to the DATASET.md file for specific details on downloading and processing the relevant datasets.
We provide external expert knowledge related to the object classes used in our experiments, including object attributes, analogous classes, synonyms, and one-to-one attributes. This knowledge is generated through calls to the ChatGPT API or WordNet and is stored in text format. Download link: link
After download the expert_knowledge.zip file, you need to unzip the zip file to get the folder ./expert_knowledge
.
To run the zero-shot image classification experiments:
bash exp_zero_shot.sh
To run the few-shot image classification experiments:
bash exp_few_shot.sh
You need to first modify the root_path
attribute in each YAML file (e.g. caltech101.yaml) under the ./configs
folder, setting it to the absolute path of the root folder where your data is located.
We would like to express our gratitude to the following repositories for offering valuable components and functions that contributed to our code.