the specific value of 𝑀 #4

shinever22 · 2024-03-10T12:28:11Z

Hello, the article partition each visual frame into 𝑀 patches, could you please tell me the specific value of 𝑀? In the code, feat_script/extract_clip_feat/extract_patch-level_feat.py
img_features = torch.zeros(len(img_list), patch_nums, C) also does not reflect the exact size of patch_nums.

xia-zhe · 2024-04-06T12:17:10Z

I'm having the same problem, what should the exact size of patch_nums be set to?

shinever22 · 2024-04-09T06:20:56Z

Sorry, I did not get a reply from the author, this problem has not been solved. xia-zhe ***@***.***> 于2024年4月6日周六 20:17写道：

…

I'm having the same problem, what should the exact C be set to? — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BG2XMI4M5EFRFUDDJQ6SJM3Y37RVXAVCNFSM6AAAAABEO5ZIMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGA3DOMBUGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ayameyao · 2024-07-19T13:58:23Z

Hello, the article partition each visual frame into 𝑀 patches, could you please tell me the specific value of 𝑀? In the code, feat_script/extract_clip_feat/extract_patch-level_feat.py img_features = torch.zeros(len(img_list), patch_nums, C) also does not reflect the exact size of patch_nums.

Hi,

Thank you very much for your interest in our work.

Firstly, the patch-level features are extracted using CLIP-ViT-B/32, resulting in 49 patches (excluding CLS). This means that we need to select the Top_m patches most relevant to the problem from these 49 patches. In our experiments detailed in the paper, the value of Top_m is set to 20.

Thank you again for your attention to our paper. If you have any further questions, please feel free to contact me directly via email.

Best,
Guangyao

ayameyao · 2024-07-19T13:58:39Z

I'm having the same problem, what should the exact size of patch_nums be set to?

Hi,

Thank you very much for your interest in our work.

Firstly, the patch-level features are extracted using CLIP-ViT-B/32, resulting in 49 patches (excluding CLS). This means that we need to select the Top_m patches most relevant to the problem from these 49 patches. In our experiments detailed in the paper, the value of Top_m is set to 20.

Thank you again for your attention to our paper. If you have any further questions, please feel free to contact me directly via email.

Best,
Guangyao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the specific value of 𝑀 #4

the specific value of 𝑀 #4

shinever22 commented Mar 10, 2024

xia-zhe commented Apr 6, 2024 •

edited

Loading

shinever22 commented Apr 9, 2024 via email

ayameyao commented Jul 19, 2024

ayameyao commented Jul 19, 2024

the specific value of 𝑀 #4

the specific value of 𝑀 #4

Comments

shinever22 commented Mar 10, 2024

xia-zhe commented Apr 6, 2024 • edited Loading

shinever22 commented Apr 9, 2024 via email

ayameyao commented Jul 19, 2024

ayameyao commented Jul 19, 2024

xia-zhe commented Apr 6, 2024 •

edited

Loading