Skip to content

vpx-ecnu/MSTL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-source Templates Learning for Real-time Aerial Object Tracking

This is the official code for the paper "Multi-source Templates Learning for Real-time Aerial Object Tracking".In this work, we present an efficient Aerial Object Tracking method via Multi-source Templates named MSTL.

Highlights

Real-Time Speed on edge platform.

Our tracker can run ~200fps on GPU, ~100fps on CPU, and ~20 on Nvidia Jetson Xavier NX platform. After tensorRT to accelerate, the speed can reach , ~60fps on Jetson Xavier NX, ~19 fps on Jetson Nano.

Opposing to previous aerial trackers which evaluate on high-end platform(like Jetson AGX/Orin Series), the proposed tracker can run on extremely cheap edge platform: Jetson Nano and Jetson Xavier NX.

Competitive performance.

Year Speed(fps) UAV123(Prec.) UAV123@10fpsUAV123(Prec.) UAV20LUAV123(Prec.)
Ours 209 82.35 83.50 83.59
TCTrack CVPR 2022 128 80.05 77.39 67.20
HIFT ICCV 2021 137 78.70 74.87 76.32

Demo

demo_gif

Quick Start

Environment Preparing

python 3.7.3
pytorch 1.11.0
opencv-python 4.5.5.64

Training

First, you need to set paths for training datasets in lib/train/admin/local.py.

Then, run the following commands for training.

python lib/train/run_training.py

Evaluation

First, you need to set paths for this project in lib/test/evaluation/local.py.

Then, run the following commands for evaluation on four datasets.

  • UAV123
python tracking/test.py MSTL MSTL --dataset uav
  • UAV20L
python tracking/test.py MSTL MSTL --dataset uavl
  • UAV@10fps
python tracking/test.py MSTL MSTL --dataset uav10
  • UAV -x
python tracking/test.py MSTL MSTL --dataset uavd

Trained model and Row results

The trained models, the training logs, and the raw tracking results are provided in the model zoo

MSTL framework for other transformer-based trackers.

To use our framework for other transformer-based trackers, we jointly trained the original tracker with an additional prediction head. The head takes outputs of the transformer encoder(or transformer-based structure) as inputs and predicts the bounding box of the target directly.

As an example, we use TransT(CVPR2021) with 4 feature integration layers (Each layer with 2 self-Attention and 2 Cross-Attention) to demonstrate how to implement the proposed decoupling strategy.

  • During training:
    • Step 1: Feed the outputs of the second layers into an additional cross-attention mechanism to fuse the search and template features.
    • Step 2: Use the outputs of the cross-attention as input for prediction heads to locate the target.
    • Step 3: Train the original model together with the additional cross-attention layer and prediction heads.
  • During inference:
    • We use the outputs of the second layer to locate the target.

TransT

Tracker Original Succ. (UAV123) Original params Succ. (After decoupling) params(After decoupling)
TransT 69.1 23.0M - 16.7M
STARK-S 68.35 28.079M 68.55 18.616M

For further information, we will make the corresponding codes and pre-trained models available.

About the UAV platform

The platform mainly consists of four parts, i.e., a Pixhawk flight controller, a Figure Number transmissions, a visual camera and a Jetson Xavier NX onboard computer. The onboard computer can obtain the video flow through the USB port. The ground station computer can remotely access the onboard computer and select the target to be tracked through data transmission.

Hardware

Acknowledgement

  • Thanks for the PyTracking and stark Library, which helps us to quickly implement our ideas. We would like to thank their authors for providing great frameworks and toolkits.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages