Skip to content

Latest commit

 

History

History

mots

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Multiple Object Tracking and Segmentation Models of BDD100K

The multiple object tracking and segmentation (MOTS) task involves detecting, tracking, and segmenting objects of interest throughout each video sequence.

seg_track1

The BDD100K dataset contains MOTS annotations for 223 videos (154/32/37 for train/val/test) with 8 categories. Each video is approximately 40 seconds and annotated at 5 fps, resulting in around 200 frames per video. For details about downloading the data and the annotation format for this task, see the official documentation.

Model Zoo

PCAN

Prototypical Cross-Attention Networks (PCAN) for Multiple Object Tracking and Segmentation [NeurIPS 2021 Spotlight]

Authors: Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Abstract Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks.

Results

Detector mMOTSA-val mIDF1-val ID Sw.-val Scores-val mMOTSA-test mIDF1-test ID Sw.-test Scores-test Config Weights Preds Visuals
ResNet-50 28.1 45.4 874 scores 31.9 50.4 845 scores config model | MD5 preds | masks visuals

[Code] [Usage Instructions]

Usage

Model Inference

For model inference, please refer to the usage instructions of the corresponding model.

Output Evaluation

Validation Set

To evaluate the MOT performance on the BDD100K validation set, you can follow the official evaluation scripts provided by BDD100K:

python -m bdd100k.eval.run -t seg_track \
    -g ../data/bdd100k/labels/seg_track_20/${SET_NAME} \
    -r ${OUTPUT_DIR} \
    [--out-file ${RESULTS_FILE}] [--nproc ${NUM_PROCESS}]

Test Set

You can obtain the performance on the BDD100K test set by submitting your model predictions to our evaluation server hosted on EvalAI.

Output Visualization

For visualization, you can use the visualization tool provided by Scalabel.

Below is an example:

import os
import numpy as np
from PIL import Image
from scalabel.label.io import load
from scalabel.vis.label import LabelViewer

# load prediction frames
frames = load('$OUTPUT_FILE').frames

viewer = LabelViewer()
for frame in frames:
    img = np.array(Image.open(os.path.join('$IMG_DIR', frame.name)))
    viewer.draw(img, frame)
    viewer.save(os.path.join('$VIS_DIR', frame.videoName, frame.name))

Contribution

You can include your models in this repo as well! Please follow the contribution instructions.