DECOLA Model Zoo

In all our experiments, we used 8 Quadro RTX 6000 and 8 V100 GPUs.

How to read the tables

The "config" column contains a link to the config file. To train a model, run

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml

To evaluate a model with a trained/ pretrained model, run

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth

Third-party ImageNet-21K pre-trained models

Our paper uses ImageNet-21K pretrained models that are not part of Detectron2 (ResNet-50-21K from MIIL and SwinB-21K from Swin-Transformer). Before training, please download the models and place them under DECOLA_ROOT/weights/, and following this tool to convert the format.

DECOLA and baselines

Here we provide the configs and checkpoints of DECOLA and Detic as our main baseline. Please refer to Detic to learn about it. The baseline is trained on detection dataset (LVIS-base or LVIS) for 4x and further trained on weak dataset (ImageNet-21K) for another 4x. DECOLA is trained on the same detection dataset with language condition for 4x (phase 1) and finetuned on the same weak dataset for another 4x (phase 2). For more training detail, please see training details.

Open-vocabulary LVIS with Deformable DETR

ResNet-50 backbone

name	box AP_novel	box AP_c	box AP_f	box mAP	model
baseline	9.4	33.8	40.4	32.2	weight
baseline + self-train	23.2	36.5	41.6	36.2	weight
DECOLA [Phase 2]	27.6	38.3	42.9	38.3	weight

Swin-B backbone

name	box AP_novel	box AP_c	box AP_f	box mAP	model
baseline	16.2	43.8	49.1	41.1	weight
baseline + self-train	30.8	43.6	45.9	42.3	weight
DECOLA [Phase 2]	35.7	47.5	49.7	46.3	weight

Swin-L backbone (w/ O365)

name	box AP_novel	box AP_c	box AP_f	box mAP	model
baseline	21.9	53.3	57.7	49.6	weight
baseline + self-train	36.5	53.5	56.5	51.8	weight
DECOLA [Phase 2]	46.9	56.0	58.0	55.2	weight

Standard LVIS with Deformable DETR

ResNet-50 backbone

name	box AP_rare	box AP_c	box AP_f	box mAP	model
baseline	26.3	34.1	41.3	35.6	weight
baseline + self-train	30.0	35.3	41.0	36.6	weight
DECOLA [Phase 2]	34.8	38.7	42.5	39.6	weight
DECOLA [Phase 2 (offline)]	35.9	38.0	42.4	39.4	weight

Swin-B backbone

name	box AP_rare	box AP_c	box AP_f	box mAP	model
baseline	38.3	43.4	48.6	44.5	weight
baseline + self-train	42.0	44.0	48.1	45.2	weight
DECOLA [Phase 2]	46.4	46.9	49.4	47.8	weight
DECOLA [Phase 2 (offline)]	47.4	47.4	49.6	48.3	weight

Open-vocabulary LVIS with CenterNet2

For DECOLA training, we use pseudo-labels generated from Phase 1 DECOLA(R50, SwinB) trained on LVIS-base. See here to learn about how to generate pseudo-labels.

ResNet-50 backbone

name	box AP_novel	box mAP	mask AP_novel	mask mAP	model
Detic-base	17.6	33.8	16.4	30.2	weight
Detic	26.7	36.3	24.6	32.4	weight
DECOLA label [config]	29.0	37.6	26.8	33.6	weight
DECOLA label [config]	29.5	37.7	27.0	33.7	weight

Swin-B backbone

name	box AP_novel	box mAP	mask AP_novel	mask mAP	model
Detic-base	24.6	43.0	21.9	38.4	weight
Detic	36.6	45.7	33.8	40.7	weight
DECOLA label [config]	38.4	46.7	35.3	42.0	weight

NOTE: baseline and Detic weights are directly from Detic's Model-Zoo.

Direct zero-shot transfer to LVIS minival

name	backbone	data	AP_r	AP_c	AP_f	mAP_fixed	model
DECOLA [Phase 1]	Swin-T	O365
DECOLA [Phase 2]	Swin-T	O365, IN21K	32.8	32.0	31.8	32.0	weight
DECOLA [Phase 1]	Swin-L	O365
DECOLA [Phase 2]	Swin-L	O365, OID, IN21K	41.5	38.0	34.9	36.8	weight

Direct zero-shot transfer to LVIS v1.0

name	backbone	data	AP_r	AP_c	AP_f	mAP_fixed	model
DECOLA [Phase 1]	Swin-T	O365	-
DECOLA [Phase 2]	Swin-T	O365, IN21K	27.2	24.9	28.0	26.6	weight
DECOLA [Phase 1]	Swin-L	O365	-
DECOLA [Phase 2]	Swin-L	O365, OID, IN21K	32.9	29.1	30.3	30.2	weight

Standard LVIS with CenterNet2

For DECOLA training, we use pseudo-labels generated from Phase 1 DECOLA(R50, SwinB) trained on LVIS.

ResNet-50 backbone

name	box AP_rare	box mAP	mask AP_rare	mask mAP	model
Detic-base	28.2	35.3	25.6	31.4	weight
Detic	31.4	36.8	29.7	33.2	weight
DECOLA label [config]	35.6	38.6	32.1	34.4	weight
DECOLA label [config]	35.4	38.3	32.1	34.2	weight

Swin-B backbone

name	box AP_rare	box mAP	mask AP_rare	mask mAP	model
Detic-base	39.9	45.4	35.9	40.7	weight
Detic	45.8	46.9	41.7	41.7	weight
DECOLA label [config]	46.6	48.3	42.3	43.4	weight

NOTE: baseline and Detic weights are directly from Detic's Model-Zoo.

DECOLA phase 1 on conditioned-mAP (c-mAP)

Here, we provide the DECOLA checkpoints in phase 1 training (language-condition). The main evaluation metric for these models as well as standard detector (baseline) is c-mAP@k, where k is per-image detection limit.

To evaluate a baseline model for c-mAP, run

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth MODEL.DETR.ORACLE_EVALUATION True TEST.DETECTIONS_PER_IMAGE $k

To evaluate a Phase 1 DECOLA model for c-mAP, run

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth MODEL.DECOLA.ORACLE_EVALUATION True MODEL.DECOLA.TEST_CLASS_CONDITIONED True TEST.DETECTIONS_PER_IMAGE $k

Change k for different per-image detection limits.

ResNet-50 backbone

name	data	AP_r@10	AP_r@20	AP_r@50	AP_r@100	AP_r@300	model
baseline	LVIS-base	6.0	11.3	19.2	26.8	31.9	weight
DECOLA [Phase 1 ]	LVIS-base	19.4	28.5	34.1	38.7	40.0	weight
baseline	LVIS	21.3	29.4	36.9	41.1	44.6	weight
DECOLA [Phase 1 ]	LVIS	26.6	39.1	45.2	47.1	48.8	weight

Swin-B backbone

name	data	AP_r@10	AP_r@20	AP_r@50	AP_r@100	AP_r@300	model
baseline	LVIS-base	7.4	16.1	27.5	33.1	41.9	weight
DECOLA [Phase 1]	LVIS-base	21.9	32.0	40.0	44.0	47.7	weight
baseline	LVIS	30.1	38.2	45.5	49.3	53.2	weight
DECOLA [Phase 1 ]	LVIS	33.5	43.9	51.4	53.8	55.8	weight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

DECOLA Model Zoo

How to read the tables

Third-party ImageNet-21K pre-trained models

DECOLA and baselines

Open-vocabulary LVIS with Deformable DETR

ResNet-50 backbone

Swin-B backbone

Swin-L backbone (w/ O365)

Standard LVIS with Deformable DETR

ResNet-50 backbone

Swin-B backbone

Open-vocabulary LVIS with CenterNet2

ResNet-50 backbone

Swin-B backbone

Direct zero-shot transfer to LVIS minival

Direct zero-shot transfer to LVIS v1.0

Standard LVIS with CenterNet2

ResNet-50 backbone

Swin-B backbone

DECOLA phase 1 on conditioned-mAP (c-mAP)

ResNet-50 backbone

Swin-B backbone

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

DECOLA Model Zoo

How to read the tables

Third-party ImageNet-21K pre-trained models

DECOLA and baselines

Open-vocabulary LVIS with Deformable DETR

ResNet-50 backbone

Swin-B backbone

Swin-L backbone (w/ O365)

Standard LVIS with Deformable DETR

ResNet-50 backbone

Swin-B backbone

Open-vocabulary LVIS with CenterNet2

ResNet-50 backbone

Swin-B backbone

Direct zero-shot transfer to LVIS minival

Direct zero-shot transfer to LVIS v1.0

Standard LVIS with CenterNet2

ResNet-50 backbone

Swin-B backbone

DECOLA phase 1 on conditioned-mAP (c-mAP)

ResNet-50 backbone

Swin-B backbone