model | backbone | top1 | top5 | mAP | model | M3A configs |
---|---|---|---|---|---|---|
{ν} | R34 | 58.7 | 83.7 | 61.7 | link |
NONE,NONE,NONE |
G-{ν,α} | R34 | 60.4 | 85.3 | 64.7 | link |
GRAPH,AUDIO,NONE |
G-{ν,τ} | R34 | 60.5 | 85.5 | 64.4 | link |
GRAPH,TEXT,NONE |
G-{ν,α,τ} | R34 | 60.7 | 85.6 | 65.0 | link |
GRAPH,AUDIO_TEXT,SUM |
T-{ν,α} | R34 | 61.0 | 85.4 | 65.0 | link |
TRSFMR,AUDIO,NONE |
T-{ν,τ} | R34 | 61.1 | 85.8 | 65.4 | link |
TRSFMR,TEXT,NONE |
T-{ν,α,τ} | R34 | 61.6 | 85.5 | 65.9 | link |
TRSFMR,AUDIOTEXT,NONE |
Notes:
- The "M3A configs" corresponds to "M3A_MODE,M3A_MODAL_TYPE,M3A_MODAL_JOINT_TYPE" in scripts/run_mmit_r2plus1d.sh.
- The baseline "{ν}" model is trained by configs/Mmit/R2PLUS1D_GRAPH_8x2.yaml.
model | backbone | top1 | top5 | mAP | model | M3A configs |
---|---|---|---|---|---|---|
{ν} | R34 | 59.2 | 84.4 | 62.5 | link |
NONE,NONE,NONE |
G-{ν,α} | R34 | 61.2 | 85.9 | 65.2 | link |
GRAPH,AUDIO,NONE |
G-{ν,τ} | R34 | 61.2 | 85.7 | 64.8 | link |
GRAPH,TEXT,NONE |
G-{ν,α,τ} | R34 | 61.5 | 85.7 | 65.6 | link |
GRAPH,AUDIO_TEXT,SUM |
T-{ν,α} | R34 | 61.8 | 85.8 | 66.0 | link |
TRSFMR,AUDIO,NONE |
T-{ν,τ} | R34 | 61.7 | 86.2 | 66.1 | link |
TRSFMR,TEXT,NONE |
T-{ν,α,τ} | R34 | 61.6 | 86.2 | 66.4 | link |
TRSFMR,AUDIOTEXT,SUM |
Notes:
- The "MMIT_VERSION" in scripts/run_mmit_r2plus1d.sh needs to be set to "v2".
- Add "MODEL.NUM_CLASSES 292" to scripts/run_mmit_r2plus1d.sh.