Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the specific value of 𝑀 #4

Open
shinever22 opened this issue Mar 10, 2024 · 4 comments
Open

the specific value of 𝑀 #4

shinever22 opened this issue Mar 10, 2024 · 4 comments

Comments

@shinever22
Copy link

Hello, the article partition each visual frame into 𝑀 patches, could you please tell me the specific value of 𝑀? In the code, feat_script/extract_clip_feat/extract_patch-level_feat.py
img_features = torch.zeros(len(img_list), patch_nums, C) also does not reflect the exact size of patch_nums.

@xia-zhe
Copy link

xia-zhe commented Apr 6, 2024

I'm having the same problem, what should the exact size of patch_nums be set to?

@shinever22
Copy link
Author

shinever22 commented Apr 9, 2024 via email

@ayameyao
Copy link
Collaborator

Hello, the article partition each visual frame into 𝑀 patches, could you please tell me the specific value of 𝑀? In the code, feat_script/extract_clip_feat/extract_patch-level_feat.py img_features = torch.zeros(len(img_list), patch_nums, C) also does not reflect the exact size of patch_nums.

Hi,

Thank you very much for your interest in our work.

Firstly, the patch-level features are extracted using CLIP-ViT-B/32, resulting in 49 patches (excluding CLS). This means that we need to select the Top_m patches most relevant to the problem from these 49 patches. In our experiments detailed in the paper, the value of Top_m is set to 20.

Thank you again for your attention to our paper. If you have any further questions, please feel free to contact me directly via email.

Best,
Guangyao

@ayameyao
Copy link
Collaborator

I'm having the same problem, what should the exact size of patch_nums be set to?

Hi,

Thank you very much for your interest in our work.

Firstly, the patch-level features are extracted using CLIP-ViT-B/32, resulting in 49 patches (excluding CLS). This means that we need to select the Top_m patches most relevant to the problem from these 49 patches. In our experiments detailed in the paper, the value of Top_m is set to 20.

Thank you again for your attention to our paper. If you have any further questions, please feel free to contact me directly via email.

Best,
Guangyao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants