Track-On: Transformer-based Online Point Tracking with Memory

This repository is the official implementation of the paper:

Track-On: Transformer-based Online Point Tracking with Memory

Görkay Aydemir, Xiongyi Cai, Weidi Xie, Fatma Guney

International Conference on Learning Representations (ICLR), 2025

Overview

Track-On is an efficient, online point tracking model that tracks points in a frame-by-frame manner using memory. It leverages a transformer-based architecture to maintain a compact yet effective memory of previously tracked points.

Installation

1. Clone the repository

git clone https://github.com/gorkaydemir/track_on.git 
cd track_on

2. Set up the environment

conda create -n trackon python=3.8 -y
conda activate trackon
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html
pip install -r requirements.txt

3. Download Datasets

To obtain the necessary datasets, follow the instructions provided in the TAP-Vid repository:

Evaluation Datasets:
- TAP-Vid Benchmark (DAVIS, RGB-Stacking, Kinetics)
- Robo-TAP
Training Dataset:
- MOVi-F – Refer to this GitHub issue for additional guidance.

Quick Demo

Check out the demo notebook for a quick start with the model.

Usage Options

Track-On provides two practical usage modes, both handling frames online but differing in input format:

1. Frame-by-frame input (for streaming videos)

from model.track_on_ff import TrackOnFF

model = TrackOnFF(args)
model.init_queries_and_memory(queries, first_frame)

while True:
    out = model.ff_forward(new_frame)

2. Video input (for benchmarking)

from model.track_on import TrackOn

model = TrackOn(args)
out = model.inference(video, queries)

Evaluation

1. Download Pretrained Weights

Download the pre-trained checkpoint from Hugging Face.

2. Run Evaluation

Given:

evaluation_dataset: The dataset to evaluate on
tapvid_root: Path to evaluation dataset
checkpoint_path: Path to the downloaded checkpoint

Run the following command:

torchrun --master_port=12345 --nproc_per_node=1 main.py \
    --eval_dataset evaluation_dataset \
    --tapvid_root /path/to/eval/data \
    --checkpoint_path /path/to/checkpoint \
    --online_validation

This should reproduce the exact results reported in the paper when configured correctly.

Training

1. Prepare datasets

Movi-f dataset: Located at /root/to/movi_f
TAP-Vid evaluation dataset:
- Dataset name: eval_dataset
- Path: /root/to/tap_vid
Training name: training_name

2. Run Training

A multi-node training script is provided in train.sh. Default training arguments are set within the script.

📖 Citation

If you find our work useful, please cite:

@InProceedings{Aydemir2025ICLR,
    author    = {Aydemir, G\"orkay and Cai, Xiongyi and Xie, Weidi and G\"uney, Fatma},
    title     = {{Track-On}: Transformer-based Online Point Tracking with Memory},
    booktitle = {The Thirteenth International Conference on Learning Representations},
    year      = {2025}
}

Acknowledgments

This repository incorporates code from several public works, including CoTracker, TAPNet, DINOv2, ViTAdapter, and SPINO. Special thanks to the authors of these projects for making their code available.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
checkpoints		checkpoints
dataset		dataset
dino_adapter		dino_adapter
media		media
model		model
scripts		scripts
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
main.py		main.py
read_args.py		read_args.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Track-On: Transformer-based Online Point Tracking with Memory

Overview

Installation

1. Clone the repository

2. Set up the environment

3. Download Datasets

Quick Demo

Usage Options

1. Frame-by-frame input (for streaming videos)

2. Video input (for benchmarking)

Evaluation

1. Download Pretrained Weights

2. Run Evaluation

Training

1. Prepare datasets

2. Run Training

📖 Citation

Acknowledgments

About

Releases

Packages

Languages

License

gorkaydemir/track_on

Folders and files

Latest commit

History

Repository files navigation

Track-On: Transformer-based Online Point Tracking with Memory

Overview

Installation

1. Clone the repository

2. Set up the environment

3. Download Datasets

Quick Demo

Usage Options

1. Frame-by-frame input (for streaming videos)

2. Video input (for benchmarking)

Evaluation

1. Download Pretrained Weights

2. Run Evaluation

Training

1. Prepare datasets

2. Run Training

📖 Citation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages