The multiple object tracking and segmentation (MOTS) task involves detecting, tracking, and segmenting objects of interest throughout each video sequence.
The BDD100K dataset contains MOTS annotations for 223 videos (154/32/37 for train/val/test) with 8 categories. Each video is approximately 40 seconds and annotated at 5 fps, resulting in around 200 frames per video. For details about downloading the data and the annotation format for this task, see the official documentation.
Prototypical Cross-Attention Networks (PCAN) for Multiple Object Tracking and Segmentation [NeurIPS 2021 Spotlight]
Authors: Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
Abstract
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks.Detector | mMOTSA-val | mIDF1-val | ID Sw.-val | Scores-val | mMOTSA-test | mIDF1-test | ID Sw.-test | Scores-test | Config | Weights | Preds | Visuals |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 | 28.1 | 45.4 | 874 | scores | 31.9 | 50.4 | 845 | scores | config | model | MD5 | preds | masks | visuals |
For model inference, please refer to the usage instructions of the corresponding model.
To evaluate the MOT performance on the BDD100K validation set, you can follow the official evaluation scripts provided by BDD100K:
python -m bdd100k.eval.run -t seg_track \
-g ../data/bdd100k/labels/seg_track_20/${SET_NAME} \
-r ${OUTPUT_DIR} \
[--out-file ${RESULTS_FILE}] [--nproc ${NUM_PROCESS}]
You can obtain the performance on the BDD100K test set by submitting your model predictions to our evaluation server hosted on EvalAI.
For visualization, you can use the visualization tool provided by Scalabel.
Below is an example:
import os
import numpy as np
from PIL import Image
from scalabel.label.io import load
from scalabel.vis.label import LabelViewer
# load prediction frames
frames = load('$OUTPUT_FILE').frames
viewer = LabelViewer()
for frame in frames:
img = np.array(Image.open(os.path.join('$IMG_DIR', frame.name)))
viewer.draw(img, frame)
viewer.save(os.path.join('$VIS_DIR', frame.videoName, frame.name))
You can include your models in this repo as well! Please follow the contribution instructions.