This is the Doppelgangers dataset, a benchmark dataset that allows for training and standardized evaluation of visual disambiguation algorithms. The dataset is described in the paper "Doppelgangers: Learning to Disambiguate Images of Similar Structures".
Doppelgangers consists of a collection of internet photos of world landmarks and cultural sites that exhibit repeated patterns and symmetric structures. The dataset includes a large number of image pairs, each labeled as either positive or negative based on whether they are true or false (illusory) matching pairs.
We provide a download.sh
script for downloading, extracting, and pre-processing the complete Doppelgangers Dataset. Dataset can be placed under the folder ./data/doppelgangers_dataset/
with the following layout:
[Click to expand]
|---doppelgangers
|---images
|---test_set
|---...
|---train_set_flip
|---...
|---train_set_noflip
|---...
|---train_megadepth
|---...
|---loftr_matches
|---test_set
|---...
|---train_set_flip
|---...
|---train_set_noflip
|---...
|---train_megadepth
|---...
|---pairs_metadata
|---...
If you want to download only a portion of the dataset, you can find detailed instructions below.
This page includes downloads for:
- Train set without image flip augmentation
- Train set with image flip augmentation
- Test set
- COLMAP reconstructions
- Pretrained model checkpoints
For the train and test sets, we provide downloads for images, image pair labels, and precomputed LoFTR matches.
All compressed tar.gz
files will extract into a joint /doppelgangers/
directory. Generally, different types of content will map to the following subdirectories:
- Images →
/doppelgangers/images/(set_name)
- Image pair info →
/doppelgangers/pairs_metadata/(set_name)
- LoFTR matches →
/doppelgangers/loftr_matches/(set_name)
- Pretrained models →
/doppelgangers/checkpoints/
- COLMAP SfM reconstructions →
/doppelgangers/reconstructions/
The image directory structure follows the WikiScenes dataset data structure as described in Section 1, Images and Textual Descriptions.
The image pair labels are stored using the NumPy .npy
file format. There is one .npy
file for every train or test set. Every .npy
file contains a numpy array whose entries represent image pairs, and each entry is itself a numpy array.
An image pair entry has the format:
array([
image_0_relative_path : str,
image_1_relative_path : str,
pos_neg_pair_label (pos=1, neg=0) : int,
number_of_SIFT_matches : int
])
Example of an image pair entry:
array(['Berlin_Cathedral/east/0/pictures/Exterior of Berlin Cathedral 18.jpg',
'Berlin_Cathedral/west/0/0/pictures/Exterior of Berlin Cathedral 14.jpg',
0, 15], dtype=object)
The LoFTR matches are stored using the NumPy .npy
file format. There are multiple .npy
files per train or test set—one per image pair. The name of the .npy
file is the index of the pair's location in the image pair NumPy array.
- Images: train_set_noflip.tar.gz (11G)
- LoFTR matches: matches_train_noflip.tar.gz (1.2G)
- Image pair info: (jump to section)
Follow the Preparing the Dataset section.
- Images:
- Base images: train_set_flip.tar.gz (29G)
- MegaDepth subset, images: train_megadepth.tar.gz (41G)
- MegaDepth subset, metadata: megadepth.json
- Image flip augmentation script: flip_augmentation.py
- LoFTR matches:
- Base matches: matches_train_flip.tar.gz (1.8G)
- MegaDepth subset matches: matches_megadepth.tar.gz (1.1G)
- Image pair info: (jump to section)
We provide a Python script flip_augmentation.py
to perform the image flip augmentation on the provided base images. To use this script, please modify the configuration options at the beginning of the script and run with python flip_augmentation.py
.
This train set includes a subset of MegaDepth images. Note that the MegaDepth images also have flip augmentations. Metadata on the subset of MegaDepth images that are used are stored in megadepth.json
. The subset of images can also be directly downloaded, and are stored in train_megadepth.tar.gz
.
Note that the file structure of our MegaDepth images are adjusted from the downloaded version. Let xxxx
be the MegaDepth scene ID. The mapping from the download version to our file paths is as follows:
- The
xxxx/dense/images
in the downloaded version maps to ourxxxx/images/
directory. - The
xxx/dense(int)/images
in the downloaded version maps to ourxxxx/images/dense(int)/images/
directory. - The scene ID's
0147_1
and0290_1
contain thedense1
images of0147
and0290
, respectively. They are separated into a separate scene because thedense1
images depict different landmarks from those depicted the originaldense
directories.
- Images: test_set.tar.gz (2G)
- LoFTR matches: matches_test.tar.gz (76M)
- Image pair info: (jump to section)
No additional steps are required.
Image pair metadata for all training and test sets: pairs_metadata.tar.gz (12M)
Pretrained model checkpoint with image flip augmentation: checkpoint.tar.gz (119M)
COLMAP reconstructions of the sixteen test scenes described in the paper: reconstructions.tar.gz (3G)
Licensing information for images in the train and test sets sourced from Wikimedia Commons are here: attributions.json
If you find Doppelgangers useful for your work please cite:
@inproceedings{cai2023doppelgangers,
title = {Doppelgangers: Learning to Disambiguate Images of Similar Structures},
author = {Cai, Ruojin and Tung, Joseph and Wang, Qianqian and Averbuch-Elor, Hadar and Hariharan, Bharath and Snavely, Noah},
journal = {ICCV},
year = {2023}
}