You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our paper uses ImageNet-21K pretrained models that are not part of Detectron2 (ResNet-50-21K from MIIL and SwinB-21K from Swin-Transformer). Before training,
please download the models and place them under DECOLA_ROOT/weights/, and following this tool to convert the format.
DECOLA and baselines
Here we provide the configs and checkpoints of DECOLA and Detic as our main baseline.
Please refer to Detic to learn about it.
The baseline is trained on detection dataset (LVIS-base or LVIS) for 4x and further trained on weak dataset (ImageNet-21K) for another 4x.
DECOLA is trained on the same detection dataset with language condition for 4x (phase 1) and finetuned on the same weak dataset for another 4x (phase 2).
For more training detail, please see training details.
For DECOLA training, we use pseudo-labels generated from Phase 1 DECOLA(R50, SwinB) trained on LVIS-base. See here to learn about how to generate pseudo-labels.
NOTE: baseline and Detic weights are directly from Detic's Model-Zoo.
DECOLA phase 1 on conditioned-mAP (c-mAP)
Here, we provide the DECOLA checkpoints in phase 1 training (language-condition). The main evaluation metric for these models as well as standard detector (baseline) is c-mAP@k, where k is per-image detection limit.