Specific Settings of the ToMe Model #3

leeyf99 · 2024-10-09T03:43:04Z

Could you please clarify which pre-trained ToMe model is used when obtaining the "visual_patch" features? What is the setting for the "r" of ToMe? Additionally, I noticed that the "audio_patch" feature is not actually being utilized. Thanks.

xia-zhe · 2024-11-18T08:59:25Z

I trained the model using the parameter settings specified in the code, and the results are as follows：
Audio Count Acc: 77.48 %
Audio Compt Acc: 60.44 %
Audio Averg Acc: 71.20 %

Visual Count Acc: 76.69 %
Visual Local Acc: 77.06 %
Visual Averg Acc: 76.88 %

Audio-Visual Exist Acc: 76.92 %
Audio-Visual Count Acc: 76.36 %
Audio-Visual Local Acc: 59.89 %
Audio-Visual Compt Acc: 63.67 %
Audio-Visual Templ Acc: 66.55 %
Audio-Visual Averg Acc: 69.17 %

---->Overall Accuracy: 71.57 %

Could you clarify where the issue occurred? Is it related to the "audio_patch" feature?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specific Settings of the ToMe Model #3

Specific Settings of the ToMe Model #3

leeyf99 commented Oct 9, 2024

xia-zhe commented Nov 18, 2024 •

edited

Loading

Specific Settings of the ToMe Model #3

Specific Settings of the ToMe Model #3

Comments

leeyf99 commented Oct 9, 2024

xia-zhe commented Nov 18, 2024 • edited Loading

xia-zhe commented Nov 18, 2024 •

edited

Loading