updates

pritamqu · Nov 5, 2021 · 89ef1c8 · 89ef1c8
1 parent 1d38600
commit 89ef1c8
Show file tree

Hide file tree

Showing 3 changed files with 47 additions and 7 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,30 @@
+BSD 3-Clause License
+
+Copyright (c) 2021, Pritam Sarkar ([email protected])
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
diff --git a/README.md b/README.md
@@ -32,6 +32,11 @@ We present the top-1 accuracy averaged over all the splits of each dataset. Plea
 | Kinetics400 | 240K | 91.5% | 64.7% | 86.8% | [visual](../weights/vid_crisscross_kinetics400.pth.tar); [audio](../weights/aud_crisscross_kinetics400.pth.tar)
 | AudioSet | 1.8M | 92.4% | 66.8% | 90.5% | [visual](../weights/vid_crisscross_audioset.pth.tar); [audio](../weights/aud_crisscross_audioset.pth.tar)
 
+### Qualitative Analysis
+We visualize the nearest neighborhoods of video-to-video and audio-to-audio retrieval. We use Kinetics-400 to pretrain CrissCross. The pretrained backbones are then used to extract feature vectors from Kinetics-Sound. We use the Kinetics-Sound for this experiment as it consists of action classes which are prominently manifested both audibly and visually. Next, we use the features extracted from the validation split to query the training features. Please check the links for visualization:
+<br>
+<a href="https://pritamqu.github.io/CrissCross/docs/v2v.html">video-to-video retrievals</a> | <a href="https://pritamqu.github.io/CrissCross/docs/a2a.html">audio-to-audio retrievals</a>.
+
 
 ### Environment Setup
 List of dependencies can be found [here](./docs/assets/files/requirements.txt). You can create an environment as `conda create --name crisscross --file requirements.txt`
@@ -109,25 +114,30 @@ You can directly use the given weights to evaluate the model on the following be
 
 **UCF101**
 ```python
-# 8 frame evaluation
+# full-finetuning
 cd evaluate
-python evaluate/eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'ucf101' --config-file full_ft_8f_fold1 --pretext_model /path/to/model
+# 8 frame evaluation
+python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'ucf101' --config-file kinetics400/full_ft_8f_fold1 --pretext_model /path/to/model
 # 32 frame evaluation
-python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'ucf101' --config-file full_ft_32f_fold1 --pretext_model /path/to/model
+python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'ucf101' --config-file kinetics400/full_ft_32f_fold1 --pretext_model /path/to/model
 ```
 **HMDB51**
 ```python
-# 8 frame evaluation
+# full-finetuning
 cd evaluate
-python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'hmdb51' --config-file full_ft_8f_fold1 --pretext_model /path/to/model
+# 8 frame evaluation
+python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'hmdb51' --config-file kinetics400/full_ft_8f_fold1 --pretext_model /path/to/model
 # 32 frame evaluation
-python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'hmdb51' --config-file full_ft_32f_fold1 --pretext_model /path/to/model
+python eval_video.py --world-size 1 --rank 0 --gpu 0 --db 'hmdb51' --config-file kinetics400/full_ft_32f_fold1 --pretext_model /path/to/model
 ```
 **ESC50**
 ```python
 # linear evaluation using SVM
 cd evaluate
+# 2-second evaluation
 python eval_audio.py --world-size 1 --rank 0 --gpu 0 --db 'esc50' --config-file config_fold1_2s --pretext_model /path/to/model
+# 5-second evaluation
+python eval_audio.py --world-size 1 --rank 0 --gpu 0 --db 'esc50' --config-file config_fold1_5s --pretext_model /path/to/model
 ```
 
 <!-- ### Citation

diff --git a/docs/index.html b/docs/index.html
@@ -67,7 +67,7 @@ <h3 id="result">Result</h3>
 </tr>
 </tbody></table>
 
-<h3 id="Qualitative Analysis">Visualization of Retrievals</h3>
+<h3 id="Qualitative Analysis">Qualitative Analysis</h3>
 We visualize the nearest neighborhoods of video-to-video and audio-to-audio retrieval. We use Kinetics-400 to pretrain CrissCross. The pretrained backbones are then used to extract feature vectors from Kinetics-Sound. We use the Kinetics-Sound for this experiment as it consists of action classes which are prominently manifested both audibly and visually. Next, we use the features extracted from the validation split to query the training features. 
 Please check the links for visualization:
 <br>