Arxiv: https://arxiv.org/abs/2305.13631
We introduce Entity-Driven Image Search (EDIS), a challenging dataset for cross-modal image search in the news domain. EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
Our experimental results show that EDIS challenges state-of-the-art methods with dense entities and a large-scale candidate set.
git clone https://github.com/emerisly/EDIS.git
cd EDIS/
conda create -n edis
conda activate edis
pip install -r requirements.txt
- Download edis image and unzip
curl -L 'https://cornell.box.com/shared/static/w6rnuk14plns7xs0po6ksxwwvxz6s76y.part00' --output edis_image.tar.gz.part00
curl -L 'https://cornell.box.com/shared/static/vi3hzcb340efh4fko8xtycjh1cn6r79g.part01' --output edis_image.tar.gz.part01
curl -L 'https://cornell.box.com/shared/static/92t2nl89q8wxf5kk0ds6reba2wp9jeqi.part02' --output edis_image.tar.gz.part02
- Download edis json and unzip
curl -L 'https://cornell.box.com/shared/static/0aln48iy3wkvzg2iklczazmqdpdf83lc' --output edis_json.zip
- Fine-tune
updateimage_root
inretrieval_edis.yaml
to directory of edis image
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --master_port=1234 train_edis.py \
--config ./configs/retrieval_edis.yaml \
--output_dir output/retrieval_edis_mblip_4gpus_5e-5
- Evaluate
updateimage_root
inretrieval_evaluate.yaml
to directory of edis image
python evaluate_retrieval.py --config configs/retrieval_evaluate.yaml --image_bank restricted --cuda 0
python compute_metrics.py -d output/evaluate_results
You can download the pre-trained and fine-tuned checkpoint from below
checkpoints | mBLIP w/ ViT-B | mBLIP w/ ViT-L |
---|---|---|
Pre-trained | Download | Download |
Fine-tuned | - | Download |
If you find this code useful for your research, please cite our paper:
@article{liu2023edis,
title={EDIS: Entity-Driven Image Search over Multimodal Web Content},
author={Liu, Siqi and Feng, Weixi and Chen, Wenhu and Wang, William Yang},
journal={arXiv preprint arXiv:2305.13631},
year={2023}
}
We thank the authors of TARA, VisualNews, BLIP, CLIP, and Pyserini for their work and open-sourcing.