Xingyu Jiang1, Jiangwei Ren1, Zizhuo Li2, Xin Zhou1, Dingkang Liang1 and Xiang Bai1β
1 Huazhong University of Science & Technology, 2 Wuhan University.
(β ) Corresponding author.
- [27/Dec/2024] Arvix version is released.
- [26/Dec/2024] Release the code and checkpoint.
Image matching for both cross-view and cross-modality plays a critical role in multimodal perception. In practice, the modality gap caused by different imaging systems/styles poses great challenges to the matching task. Existing works try to extract invariant features for specific modalities and train on limited datasets, showing poor generalization. In this paper, we present MINIMA, a unified image matching framework for multiple cross-modal cases. Without pursuing fancy modules, our MINIMA aims to enhance universal performance from the perspective of data scaling up. For such purpose, we propose a simple yet effective data engine that can freely produce a large dataset containing multiple modalities, rich scenarios, and accurate matching labels. Specifically, we scale up the modalities from cheap but rich RGB-only matching data, by means of generative models. Under this setting, the matching labels and rich diversity of the RGB dataset are well inherited by the generated multimodal data. Benefiting from this, we construct MD-syn, a new comprehensive dataset that fills the data gap for general multimodal image matching. With MD-syn, we can directly train any advanced matching pipeline on randomly selected modality pairs to obtain cross-modal ability. Extensive experiments on in-domain and zero-shot matching tasks, including 19 cross-modal cases, demonstrate that our MINIMA can significantly outperform the baselines and even surpass modality-specific methods.
- The MegaDepth-Syn Dataset is generated from the MegaDepth dataset using our MINIMA data engine, which contains for extra 6 modalities: infrared, depth, event, normal, sketch, and paint.
- Full dataset is released in .
- You can download the dataset using the following command:
pip install openxlab --no-dependencies #Install
openxlab login # Log in and enter the corresponding AK/SK. Please view AK/SK at usercenter
openxlab dataset info --dataset-repo lsxi7/MINIMA # Dataset information viewing and View Dataset File List
openxlab dataset get --dataset-repo lsxi7/MINIMA #Dataset download
openxlab dataset download --dataset-repo lsxi7/MINIMA --source-path /README.md --target-path /path/to/local/folder #Dataset file download
And more details can be found in .
- We provide our
minima_lightglue
,minima_loftr
andminima_roma
model weight in Google Drive. - Also we provide github link for the weights: minima_lightglue, minima_loftr and minima_roma.
- Please download the weight files and put it in the
weights
folder. - Or you can directly run:
bash weights/download.sh
We are grateful to the authors for their contribution of the testing datasets of the real multimodal scenarios.
MegaDepth-1500-Syn
We provide a bash command to download the dataset and organize the MegaDepth-1500-Syn dataset directly:
bash data/test_data_preparation.sh
Additional, please download the original megadepth-1500, and run:
tar xf megadepth_test_1500.tar
ln -s /path/to/megadepth_test_1500/Undistorted_SfM /path/to/MINMA/data/megadepth/test
RGB-Infrared Test Dataset
The METU-VisTIR dataset comes from XoFTR, and is available
at its official Google Drive.
For more information, please refer to the XoFTR.
MMIM Test Dataset
MMIM Dataset is sourced
from Multi-modality-image-matching-database-metrics-methods.
We prepare necessary JSON files with Multi-modality-image-matching-database-metrics-methods.zip file, located in the
data directory.
.
To set up the
MMIM test dataset, please follow these steps:
cd data
git clone https://github.com/StaRainJ/Multi-modality-image-matching-database-metrics-methods.git
unzip -o Multi-modality-image-matching-database-metrics-methods.zip
RGB-Depth Test Dataset
The Depth Dataset comes from the DIODE dataset.
You can directly download the dataset from its
official Amazon Web Service
or Baidu Cloud Storage.
RGB-Event Test Dataset
The aligned RGB-Event test dataset is generated from DSEC.
Our test data can be downloaded
from Google Drive.
Organizing the Dataset
We recommend organizing the datasets in the following folder structure:
data/
βββ METU-VisTIR/
β βββ index/
β βββ ...
βββ Multi-modality-image-matching-database-metrics-methods/
β βββ Multimodal_Image_Matching_Datasets/
β βββ ...
βββ megadepth/
β βββ train/[modality]/Undistorted_SfM/
βββ DIODE/
β βββ val/
βββ DSEC/
βββ vent_list.txt
βββ thun_01_a/
βββ ...
- Clone the repository:
git https://github.com/LSXI7/MINIMA.git
cd MINIMA
conda env create -f environment.yaml
conda activate minima
- Initialize the external submodule dependencies with:
git submodule update --init --recursive
git submodule update --recursive --remote
sed -i '1s/^/from typing import Tuple as tuple\n/' third_party/RoMa/romatch/models/model_zoo/__init__.py
- Run demo code after downloading the weights:
python demo.py --method sp_lg --fig1 demo/vis_test.png --fig2 demo/depth_test.png --save_dir ./demo
We provide the multi-modality image matching benchmark commands for our MINIMA models.
Choose the method from sp_lg
, loftr
, roma
and xoftr
for the multimodal evaluation.
python test_relative_pose_infrared.py --method <method> <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Infrared-RGB
python test_relative_homo_depth.py --method <method> <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Depth-RGB
python test_relative_homo_event.py --method <method> <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Event-RGB
# choose_model: 0 for medical test, 1 for remote sensing test
python test_relative_homo_mmim.py --method <method> <--ckpt model_path> --choose_model 0/1 <--save_figs> <--save_dir save_dir>
python test_relative_pose_mega_1500_syn.py --method <method> <--ckpt ckpt> --multi_model <modality> <--save_figs> <--save_dir save_dir>
# modality: infrared/depth/event/normal/sketch/paint
python test_relative_pose_mega_1500.py --method <method> <--ckpt model_path> <--save_figs> <--save_dir save_dir>
Note: By default, the checkpoint is initialized from the MINIMA models in the weights
folder, and you can specify a
custom checkpoint using the --ckpt
argument.
- MD-Syn Full Dataset
- Real Multimodal Evaluation Benchmark
- Synthetic Multimodal Evaluation Benchmark
- Training Code
- Our MINIMA Data Engine for Multimodal Data Generation
- More Modalities Addition
We sincerely thank the
SuperPoint,
LightGlue,
Glue Factory,
LoFTR,
RoMa
for their contribution of methodological development.
Additionally, we appreciate the support of MegaDepth and
SCEPTER,
Depth-Anything-V2,
DSINE,
PaintTransformer,
Anime2Sketch for their role in data generation.
If you find our work useful in your research, please consider giving a star β and a citation
@article{jiang2024minima,
title={MINIMA: Modality Invariant Image Matching},
author={Jiang, Xingyu and Ren, Jiangwei and Li, Zizhuo and Zhou, Xin and Liang, Dingkang and Bai, Xiang},
journal={arXiv preprint arXiv:2412.19412},
year={2024},
}